Notes about Floating-Point Numbers in Assembly Language

Notes about Floating-Point Numbers in Assembly Language

In ordinary mathematics, we sometimes express numbers in "scientific notation such as:

          0.632 * 10^4

         -0.781 * 10^-3

In computing, we do something very much like this but in base 16:

          0.7A0 * 16^3

         -0.4BF * 16^-2

that is, a number is represented as fraction * 16^exponent, where the fraction is between 0.0 and 1.0 or between -1.0 and 0.0.

The same can be done in base 2, and in fact that is what is going on inside the CPU. We shall stick to base 16 for now.

In assembly language, there are at least two standard formats for floating-point numbers: short and long.

Short floating-point (32 bits):

The first bit is the sign bit: 0 for positive and 1 for negative.
The next 7 bits are the exponent: -64 to +63, stored as 0 to 127. To get the actual exponent, subtract 64 from the stored value. This corrresponds to exponents approximately -79 to +75 in base 10.
The next 24 bits are the fraction: 6 hex digits (6 or 7 decimal digits of precision).

Long floating-point (64 bits):

The first bit is the sign bit: 0 for positive and 1 for negative.
The next 7 bits are the exponent: -64 to +63, stored as 0 to 127. To get the actual exponent, subtract 64 from the stored value. This corrresponds to exponents approximately -79 to +75 in base 10.
The next 56 bits are the fraction.

Examples

C3478100
Sign bit: 1
Exponent: 67 - 64 = 3 (in base 10)
Fraction: 478100 (in base 16)
Value: -0.478100 * 16^3
43478100
Sign bit: 0
Exponent: 67 - 64 = 3 (in base 10)
Fraction: 478100 (in base 16)
Value: 0.478100 * 16^3
85130101
Sign bit: 1
Exponent: 5 - 64 = -59 (in base 10)
Fraction: 130101 (in base 16)
Value: -0.130101 * 16^-59 (the 59 is in base 10)

Converting these entirely to base 10 would be work.

In assembly language, we can declare variables of this type:

FNUM1   DC   E'6'             Result:  416000000 (short formay)

FNUM2   DC   D'3.1416'        Result:  413243FE5C91D14E

FNUM3   DC   E'-1234.567E5'   Result:  C775BCCC

Here in FNUM3, the E5 is an exponent in base 10:

     -1234.567 * 10^5

There could be multiple ways to represent a number using different exponents. In base 10, for instance:

      4 = 0.4 * 10^1 = 0.04 * 10*2

To avoid confusion, a floating-point number is called "normalized" if the first digit in its fraction is not 0. (If it is 0, move the decimal point over and adjust the exponent.) If the fraction is exacly 0, this cannot be done.

We have 4 floating-point registers, each 64 bits long, numbered 0, 2 4 and 6. If we are using the short format, only the first 32 bits of each FP register are used.

We can work with short floating-point values using operation such as:

LE   Load Short

     LE   R,D(X,B)

LER  Load Short Register

     LER  R1,R2 
          
STE  Store Short

     STE  R,D(X,B)

CE   Compare Short

     CE   R,D(X,B)

CER  Compare Short Register

     CER  R1,R2

LTER Load and Test Register Short

     LTER R1,R2  (sets the Condition Code)

AE   Add Short

      AE   R,D(X,B)

AER  Add Short Register

     AER  R1,R2

and likewise we have a set of instructions for long floating-point values.

The results of arithmetic are normalized afterward (if possible).

How can we convert a FP number of a decimal number?

Express the FP number as: M * 16^N (M and N are integers).
Convert the M to a decimal integer.
If N > 0), multiply by 16^N; else divide by 16^-N.

Example

Start with 0.4ABC * 16^0.

This is 4ABC * 16^-4. Thus M = 4ABC and N = -4.

Convert M (base 16) to 19132 (base 10).

Divide 19132 by 16^4 to get 0.2912 (base 10).

What can go wrong with FP operations?

We could have any of these (at least):

addressing exception (a bad address outside of legal range)
protection exception (attempt to write to someone else's memory)
exponent overflow (the result of arithmetic has exponent too large)
exponent underflow (the result of arithmetic has exponent too small)
significance (the result of Add or Subtract has fraction = 0)
floating-point divide (attempt to divide by 0)