Example 3
Lectures
Lack of precision
Normalize 747
1 bit sign 7 bit exponent 8 bit significand
Covert to binary
747 747 - 1 * 512 = 235 msb
373 1 lsb 235 - 0 * 256 = 235
186 1 235 - 1 * 128 = 107
93 0 107 - 1 * 64 = 43
46 1 43 - 1 * 32 = 11
23 0 11 - 0 * 16 = 11
11 1 11 - 1 * 8 = 3
5 1 3 - 0 * 4 = 3
2 1 3 - 1 * 2 = 1
1 0 1 - 1 * 1 = 0 lsb
0 1 msb
Scientific notation 1011101011 -> 1. 0111 0101 1 x 2 ^ 9
Bias 2^(7-1)-1 = 63 9 + 63 = 72
72
36 0
18 0
9 0
4 1
2 0
1 0
0 1 biased exponent 1001000
Note the number of significant digits is larger than significand
storage, you must truncate or round.
1.0111 0101 1 -> truncate -> 1.0111 0101
1.0111 0101 1 -> round -> 1.0111 0110
Float - truncated
0 1001000 01110101
sign exp. sig.
Float - rounded
0 1001000 01110110
sign exp. sig.
Conversion back.
Exponent
1 0 0 1 0 0 0
1 2 4 9 18 36 72 72 - 63(bias) = 9 exponent
Trucated
Significand 01110101 -> 1.01110101 x 2 ^9 - hidden bit restored
1011101010
1 0 1 1 1 0 1 0 1 0
1 2 5 11 23 46 93 186 373 746
512 256 128 64 32 16 8 4 2 1
1 0 1 1 1 0 1 0 1 0
512 +0 +128 +64 +32 +0 +8 +0 +2 +0
512 192 32 8 2
704 40 2
744 2
746
Rounded
Significand 01110110 -> 1.01110110 x 2 ^9 - hidden bit restored
1011101100
1 0 1 1 1 0 1 1 0 0
1 2 5 11 23 46 93 187 374 748
512 256 128 64 32 16 8 4 2 1
1 0 1 1 1 0 1 1 0 0
512 +0 +128 +64 +32 +0 +8 +4 +0 +0
512 192 32 12 0
704 44 0
748
A float can store a much larger value than an integer but you risk loosing
precision even on more modest values.