Computer Architecture

ELTN 493

Lab 4

05-21-2001

 

Chris Annis

 

Analysis and Discussion


            William Kahan, interested in error analysis, worked on the rounding accuracy of floating-point numbers. In earlier computers such as the IBM 7090 and 1794, calculations done in single and double precision were not completely correct.  It was the rounding of the floating-point numbers that caused these errors. Complicated equations used mostly by engineers that resulted in very precise answers is where this was noticed first.  The problem was that part of the answer was “cut off” during the rounding process.  It was not possible to represent a decimal fraction in binary exactly because the binary equivalent goes on forever or repeats.  Kahan was determined to fix this problem and replaced the IBM's ALOG program with his own version for the IBM 7090 and his results were more precise.  Kahan then went on to draft a Standards for Binary Floating-Point. The IEEE Standard 754 for Binary Floating-Point was made official in 1985 and closely resembled Kahan's.

 

 

Floating-Point

Word Length(bits)

Exponent

Radix

Bias

Range

Precision

IEEE 754 - single precision

32

8 bits

2

127

Min= 2.04 x 10^-39:

Max= 1.70 x 10^38

7

IBM 360 - single precision

32

7 bits

16

64

Min= 2.71 x 10^-20:

Max= 8.51 x 10^37

7

IEEE 754 - double precision

64

11 bits

2

1023

Min= 2.78 x 10^-309:

Max=8.99 x 10^307

16

IBM 360 - double precision

64

7 bits

16

64

Min= 2.71 x 10^-20:

Max= 8.51 x 10^37

17

 

sign

exponent

-----------significant------------

  0

10000100 

11001010110011001100110

  1

10000000

00101010011110101110000

  0

01111011

11000111000100001100101

  0

01111001

00000110001001001101110

 

 

  = 57.35

 

  = -2.33

 

  = 0.1111

 

  = 0.016

                                                                                                     

Links


Some more information on William Kahan:
William Kahan

Information on IEEE 754 standard:

IEEE 754

 

Conclusion


In conclusion, the problems faced with representing the numbers above in floating point are simple. When dealing with fractions to represent decimal points, you can never be accurate. Take representing .33 in binary for example. Eventually a loop developed (01001111010111000) and then just kept repeating itself forever.  Representing the number .1111 on the other hand did not develop a loop and just kept on going.  This is what bothered me the most.