Review F: Floating-point representation: Questions

F2.1.

Represent each of the following using the 8-bit floating-point format we studied (which had 3 bits for the mantissa and 4 bits for the excess-7 exponent). Show your intermediate work.

a. 2.25
b. −80.0
c. 1/32
F2.2.

Consider the 8-bit floating-point format we studied, including 3 bits for the mantissa and 4 bits for the excess-7 exponent. Show your intermediate work.

a. What 8-bit pattern represents the number −0.625 = −5/8?

b. What base-10 integer or fraction does 01001001 represent?

F2.3.

Consider a 7-bit floating-point representation with 3 bits for the excess-3 exponent and 3 bits for the mantissa.

a. How would 0.375(10) be represented in this 7-bit representation?
b. What decimal value does 0110110 represent? (If you like, you can express your solution to this question and the next as a fraction.)
c. What decimal value does 1001100 represent?
F3.1.

Explain the motivation behind the introduction of the denormalized case in IEEE's floating-point representation.

F3.2.

Consider the 8-bit floating-point representation we studied in class, including support for denormalized numbers and nonnumeric values. It used 3 bits for the mantissa and 4 bits for the excess-7 exponent.

a.What 8-bit pattern represents the number 0.125(10)?
b.What 8-bit pattern represents the number 20(10)?
c.What base-10 integer or fraction does the bit pattern 11001100 represent?
d.What base-10 integer or fraction does the bit pattern 00000010 represent?
e.What bit pattern results from multiplying 240 and −240?
F3.3.

Consider a 6-bit floating-point representation with a 3 bits for the excess-3 exponent and 2 bits for the mantissa, including support for denormalized and nonnumeric values.

a. How would 0.75(10) be represented in this 6-bit representation?
b. What decimal value does 011010 represent?
c. What decimal value does 000010 represent?
d. How would infinity (∞) be represented in this representation?
F3.4.

Consider a 7-bit floating-point representation with a 3 bits for the excess-3 exponent and 3 bits for the mantissa, including support for denormalized and nonnumeric values.

a. What values do 1010100 and 00000100 represent? Express each answer as a decimal number or a base-10 fraction.
b. What is the bit pattern of the smallest positive normalized number supported by this representation? Convert this to a decimal fraction or number.
c. What is the bit pattern of the largest denormalized number supported by this representation? Convert this to a decimal fraction or number.
d. Suppose we add 0101010 and 1111000 as 7-bit floating-point numbers. What is the bit pattern of the result?
F3.5.

Using the 8-bit floating-point format we studied, describe a computation using only numeric floating-point values that would result in negative infinity.

F3.6.

Using the 8-bit floating-point format we studied, describe a computation using only numeric floating-point values that would result in the nonnumeric floating-point value NaN (“Not a Number”).

F3.7.

Consider the 8-bit floating-point format we studied, including support for denormalized numbers and nonnumeric values. It included 3 bits for the mantissa and 4 bits for the excess-7 exponent. Show your intermediate work.

a. What 8-bit pattern represents the number 40(10)?

b. What 8-bit pattern represents the number 1/256 = 1 × 2−8?

c. Give an 8-bit pattern representing “not a number.”

d. What base-10 integer or fraction does 10110100 represent?

F3.8.

Consider the 8-bit floating-point format we studied, including support for denormalized numbers and nonnumeric values. It included 3 bits for the mantissa and 4 bits for the excess-7 exponent. Show your intermediate work.

a. What 8-bit pattern represents the number −0.375 = −3/8?

b. What 8-bit pattern represents the number 3/256 = 3 × 2−8?

c. Give an 8-bit pattern representing “negative infinity.”

d. What base-10 integer or fraction does 01100010 represent?

F4.1.

Consider the 8-bit floating-point format we studied, including support for denormalized numbers and nonnumeric values. Give an example of values for a, b, and c where (a + b) + c is not the same as a + (b + c). Explain your answer, including the result for (a + b) + c and for a + (b + c).

F4.2.

Give an example of three floating-point numbers x, y, and z, such that the distributive property x (y + z) = x y + x z does not hold. (Feel free to describe the values rather than give numerical values: For example, you might say “the largest denormalized number” rather than give a particular value.) Note: Your answer should include the values of x (y + z) and x y + x z for your values of x, y, and z.

Review F: Floating-point representation: Solutions

F2.1.
a. 2.25becomes0 1000 001
b. −80.0becomes1 1101 010
c. 1/32becomes0 0010 000
F2.2.

a. −101(2) × 2−3 = −1.01(2) × 2−1 → 1 0110 010

b. 1.001(2) × 22 = 100.1(2) = 4.5

F2.3.
a. 0001100
b. 14(10)
c. −3/8 = −0.375(10)
F3.1.

With only normalized numbers, the set of floating-point numbers would have a gap representing numbers that are very close to 0: Numbers are become increasingly concentrated down to a certain point (1/128 in our eight-bit example), and then there are no numbers below that.

The denormalized case “spreads out” those numbers with exponent bits of all-zeroes, spreading them evenly out down to and including zero.

F3.2.
a.0 0100 000
b.0 1011 010
c.−6
d.1/256
e.11111000
F3.3.
a.001010
b.12.0(10)
c.0.125(10)
d.011100
F3.4.
a.−0.75(10), 0.125(10)
b.0001000, which converts to 1/4 or 0.25
c.0000111, which converts to 7/32 or 0.2187
d.1111000 (since any number added to −∞ is −∞)
F3.5.

-240.0 + -240.0: −240 is the smallest number representable in our 8-bit floating point format (1 1110 111); doubling it is well beyond the minimum number possible, so it is “approximated” by negative infinity.

Another example is -1.0 / 0.0.

F3.6.

Finding the square root of −1 leads to NaN, as does dividing zero by zero, or adding 1/x and −1/x, where x is the smallest positive numeric value (and so 1/x results in infinity).

F3.7.
a. 0 1011 010 (from 10100(2) = 1.01 × 24)
b. 0 0000 010 (from 1 × 2−8 = 0.01(2) × 2−6)
c. 0 1111 111 (or any of the form x1111yyy where at least one y is 1)
d. 5/8 (from −1.01(2) × 2−1 = −0.101(2))
F3.8.

a. 1 0101 100. (−0.375 = −0.011(2) = −1.1(2) × 2−2)

b. 0 0000 110 (11(2) × 2−8 = 0.11(2) × 2−6)

c. 1 1111 000

d. 40 (1.010(2) × 25 = 101000(2) = 32 + 8)

F4.1.

Suppose x = −120, y = 120, and z = 1. Then notice the following.

x + (y + z) = −120 + (120 + 1)
= −120 + 120
= 0

(We get 120 + 1 = 120 because the 1 can't be represented within the number's precision.) On the other side, we get:

(x + y) + z = (−120 + 120) + 1
= 0 + 1
= 1
F4.2.

One possibility is x − 0.5, y = largest possible number, and z = 1. In this case, x (y + z) is infinity, while x y + x z is a finite number.

Another possibility is x = ∞, y = −1, and z = 1. In this case, x (y + z) is infinity (since ∞ ⋅ 0 = ∞), while x y + x z is NaN (since −∞ + ∞ = NaN).

While these answers are fine, they are somewhat dissatisfying because of their reliance on overflow. Another possibility, which does not resort to nonnumeric values, has x = 0.5, y = smallest positive number, and z = smallest positive number. In this case, x (y + z) is the smallest possible number, while x y + x z results in adding two numbers that are too small to represent, so we get 0.