*The following text is
a replica of chapter 7.2 of the Intel Architecture Software
Developer's Manual Volume 1: Basic Architecture. If you are
interested in the complete manual you can download it from http://www.intel.com,
Order Number 243190.*

1. Real Numbers and Floating-Point Formats

1.1 Real Number System

1.2 Floating-Point Format

1.3 Normalized Numbers

1.4 Biased Exponent

1.5 Real Number and Non-number Encodings

1.6 Signed Zeros

1.7 Normalized and Denormalized Finite Numbers

1.8 Signed Infinities

1.9 NaNs

1.10 Indefinite

This section describes how real numbers are represented in floating-point
format in the IA FPU. It also introduces terms such as normalized numbers,
denormalized numbers, biased exponents, signed zeros, and NaNs.

As shown in Figure 1, the real-number system comprises the continuum of real numbers from minus infinity (-∞) to plus infinity (+∞).

*Figure 1: Binary Real Number System*

Because the size and number of registers that any computer can have is limited, only a subset of the real-number continuum can be used in real-number calculations. As shown at the bottom of Figure 1, the subset of real numbers that a particular FPU supports represents an approximation of the real number system. The range and precision of this real-number subset is determined by the format that the FPU uses to represent real numbers.

To increase the speed and efficiency of real-number computations,
computers or FPUs typically represent real numbers in a binary floating-point
format. In this format, a real number has three parts: a **sign,** a **significand,** and
an **exponent.** Figure 2 shows the binary
floating-point format that the IA FPU uses. This format conforms to
the IEEE standard.

*Figure 2: Binary Floating-Point Format*

The sign is a binary value that indicates whether the number is positive (0) or negative (1). The significand has two parts: a 1-bit binary integer (also referred to as the J-bit) and a binary fraction. The J-bit is often not represented, but instead is an implied value. The exponent is a binary integer that represents the base-2 power that the significand is raised to.

Table 1 shows how the real number 178.125 (in
ordinary decimal format) is stored in floating-point format. The table
lists a progression of real number notations that leads to the single-real,
32-bit floating-point format (which is one of the floating-point formats
that the FPU supports). In this format, the significand is normalized
(refer to Section Normalized Numbers)
and the exponent is biased (refer to Section Biased
Exponent). For the single-real format, the biasing constant is
+127_{10}.

*Table 1. Real Number Notation*

In most cases, the FPU represents real numbers in normalized form. This means that except for zero, the significand is always made up of an integer of 1 and the following fraction:

1.fff...ff

For values less than 1, leading zeros are eliminated. (For each leading zero eliminated, the exponent is decremented by one.)

Representing numbers in normalized form maximizes the number of significant digits that can be accommodated in a significand of a given width. To summarize, a normalized real number consists of a normalized significand that represents a real number between 1 and 2 and an exponent that specifies the number's binary point.

The FPU represents exponents in a biased form. This means that a
constant is added to the actual exponent so that the biased exponent
is always a positive number. The value of the biasing constant depends
on the number of bits available for representing exponents in the floating-point
format being used. The biasing constant is chosen so that the smallest
normalized number can be reciprocated without overflow. For 32-bit
real numbers the bias of the exponent is +127_{10}.

A variety of real numbers and special values can be encoded in the FPUÕs floating-point format. These numbers and values are generally divided into the following classes:

- Signed zeros.
- Denormalized finite numbers.
- Normalized finite numbers.
- Signed infinities.
- NaNs.
- Indefinite numbers.

(The term NaN stands for "Not a Number.")

Figure 3 shows how the encodings for these
numbers and non-numbers fit into the real number continuum. The encodings
shown here are for the IEEE single-precision (32-bit) format, where
the term **"S"** indicates the sign bit, **"E"** the
biased exponent, and **"F"** the fraction. (The exponent
values are given in decimal.) The FPU can operate on and/or return
any of these values, depending on the type of computation being performed.
The following sections describe these number and non-number classes.

*Figure 3: Real Numbers and NaNs*

Zero can be represented as a +0 or a -0 depending on the sign bit. Both encodings are equal in value. The sign of a zero result depends on the operation being performed and the rounding mode being used. Signed zeros have been provided to aid in implementing interval arithmetic. The sign of a zero may indicate the direction from which underflow occurred, or it may indicate the sign of an infinity (°) that has been reciprocated.

Non-zero, finite numbers are divided into two classes: normalized
and denormalized. The normalized finite numbers comprise all the non-zero
finite values that can be encoded in a normalized real number format
between zero and infinity (∞).
In the single-real format shown in Figure 3,
this group of numbers includes all the numbers with biased exponents
ranging from 1 to 254_{10} (unbiased, the exponent range is
from -126_{10} to +127_{10} ).

When real numbers become very close to zero, the normalized-number format can no longer be used to represent the numbers. This is because the range of the exponent is not large enough to compensate for shifting the binary point to the right to eliminate leading zeros.

When the biased exponent is zero, smaller numbers can only be represented
by making the integer bit (and perhaps other leading bits) of the significand
zero. The numbers in this range are called **denormalized** (or **tiny**)
numbers. The use of leading zeros with denormalized numbers allows
smaller numbers to be represented. However, this denormalization causes
a loss of precision (the number of significant bits in the fraction
is reduced by the leading zeros).

When performing normalized floating-point computations, an FPU normally
operates on normalized numbers and produces normalized numbers as results.
Denormalized numbers represent an **underflow** condition.

A denormalized number is computed through a technique called gradual
underflow. Table 2 gives an example of gradual
underflow in the denormalization process. Here the single-real format
is being used, so the minimum exponent (unbiased) is -126_{10}.
The true result in this example requires an exponent of -129_{10} in
order to have a normalized number. Since -129_{10} is beyond
the allowable exponent range, the result is denormalized by inserting
leading zeros until the minimum exponent of -126_{10} is reached.

Operation | Sign | Exponent* | Significand |

True Result | 0 | -129 | 1.01011100000...00 |

Denormalize | 0 | -128 | 0.10101110000...00 |

Denormalize | 0 | -127 | 0.01010111000...00 |

Denormalize | 0 | -126 | 0.00101011100...00 |

Denormal Result | 0 | -126 | 0.00101011100...00 |

*Table 2: Denormalization Process*

In the extreme case, all the significant bits are shifted out to the right by leading zeros, creating a zero result.

The FPU deals with denormal values in the following ways:

- It avoids creating denormals by normalizing numbers whenever possible.

- It provides the floating-point underflow exception to permit programmers
to detect cases when denormals are created.

- It provides the floating-point denormal-operand exception to permit procedures or programs to detect when denormals are being used as source operands for computations.

When a denormal number in single- or double-real format is used as
a source operand and the denormal exception is masked, the FPU automatically **normalizes** the
number when it is converted to extended-real format.

The two infinities, +∞ and
-∞, represent the
maximum positive and negative real numbers, respectively, that can
be represented in the floating-point format. Infinity is always represented
by a zero significand (fraction and integer bit) and the maximum biased
exponent allowed in the specified format (for example, 255_{10} for
the single-real format).

The signs of infinities are observed, and comparisons are possible. Infinities are always inter-preted in the affine sense; that is, -∞ is less than any finite number and +∞ is greater than any finite number. Arithmetic on infinities is always exact. Exceptions are generated only when the use of an infinity as a source operand constitutes an invalid operation.

Whereas denormalized numbers represent an underflow condition, the two infinity numbers represent the result of an overflow condition. Here, the normalized result of a computation has a biased exponent greater than the largest allowable exponent for the selected result format.

Since NaNs are non-numbers, they are not part of the real number line. In Figure 3, the encoding space for NaNs in the FPU floating-point formats is shown above the ends of the real number line. This space includes any value with the maximum allowable biased exponent and a non-zero fraction. (The sign bit is ignored for NaNs.)

The IEEE standard defines two classes of NaN: quiet NaNs (QNaNs) and signaling NaNs (SNaNs). A QNaN is a NaN with the most significant fraction bit set; an SNaN is a NaN with the most significant fraction bit clear. QNaNs are allowed to propagate through most arithmetic operations without signaling an exception. SNaNs generally signal an invalid-operation excep­tion whenever they appear as operands in arithmetic operations.

For each FPU data type, one unique encoding is reserved for representing
the special value **indefinite.** For example, when operating on
real values, the real indefinite value is a QNaN. The FPU produces
indefinite values as responses to masked floating-point exceptions.

TheIntel Architecture Software Developer's Manual Volume 1: Basic Architectureis

COPYRIGHT © INTEL CORPORATION 1999

*THIRD-PARTY BRANDS AND NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS.