Computer Architecture
ELTN 493
Lab 4
05-21-2001
William Kahan, interested in error analysis, worked on the rounding accuracy of floating-point numbers. In earlier computers such as the IBM 7090 and 1794, calculations done in single and double precision were not completely correct. It was the rounding of the floating-point numbers that caused these errors. Complicated equations used mostly by engineers that resulted in very precise answers is where this was noticed first. The problem was that part of the answer was “cut off” during the rounding process. It was not possible to represent a decimal fraction in binary exactly because the binary equivalent goes on forever or repeats. Kahan was determined to fix this problem and replaced the IBM's ALOG program with his own version for the IBM 7090 and his results were more precise. Kahan then went on to draft a Standards for Binary Floating-Point. The IEEE Standard 754 for Binary Floating-Point was made official in 1985 and closely resembled Kahan's.
|
Floating-Point |
Word
Length(bits) |
Exponent |
Radix |
Bias |
Range |
Precision |
|
IEEE 754 - single
precision |
32 |
8 bits |
2 |
127 |
Min= 2.04 x 10^-39: Max= 1.70 x 10^38 |
7 |
|
IBM 360 - single
precision |
32 |
7 bits |
16 |
64 |
Min= 2.71 x 10^-20: Max= 8.51 x 10^37 |
7 |
|
IEEE 754 - double
precision |
64 |
11 bits |
2 |
1023 |
Min= 2.78 x 10^-309: Max=8.99 x 10^307 |
16 |
|
IBM 360 - double
precision |
64 |
7 bits |
16 |
64 |
Min= 2.71 x 10^-20: Max= 8.51 x 10^37 |
17 |
|
sign |
exponent |
-----------significant------------ |
|
0 |
10000100 |
11001010110011001100110 |
|
1 |
10000000 |
00101010011110101110000 |
|
0 |
01111011 |
11000111000100001100101 |
|
0 |
01111001 |
00000110001001001101110 |
= 57.35
= -2.33
= 0.1111
= 0.016
Some more information on William Kahan:
William Kahan
Information on IEEE 754 standard:
In conclusion, the problems
faced with representing the numbers above in floating point are simple. When
dealing with fractions to represent decimal points, you can never be accurate.
Take representing .33 in binary for example. Eventually a loop developed
(01001111010111000) and then just kept repeating itself forever. Representing the number .1111 on the other
hand did not develop a loop and just kept on going. This is what bothered me the most.