0

# How is scientific notation related to the floating point representation used by computers?

Wiki User

2017-06-17 06:37:50

Floating point numbers are stored in scientific notation using base 2 not base 10.

There are a limited number of bits so they are stored to a certain number of significant binary figures.

There are various number of bytes (bits) used to store the numbers - the bits being split between the mantissa (the number) and the exponent (the power of 10 (being in the base of the storage - in binary, 10 equals 2 in decimal) by which the mantissa is multiplied to get the binary/decimal point back to where it should be), examples:

• Single precision (IEEE) uses 4 bytes: 8 bits for the exponent (encoding ±), 1 bit for the sign of the number and 23 bits for the number itself;
• Double precision (IEEE) uses 8 bytes: 11 bits for the exponent, 1 bit for the sign, 52 bits for the number;
• The Commodore PET used 5 bytes: 8 bits for the exponent, 1 bit for the sign and 31 bits for the number;
• The Sinclair QL used 6 bytes: 12 bits for the exponent (stored in 2 bytes, 16 bits, 4 bits of which were unused), 1 bit for the sign and 31 bits for the number.

The numbers are stored normalised:

In decimal numbers the digit before the decimal point is non-zero, ie one of {1, 2, ..., 9}.

In binary numbers, the only non-zero digit is 1, so *every* floating point number in binary (except 0) has a 1 before the binary point; thus the initial 1 (before the binary point) is not stored (it is implicit).

The exponent is stored by adding an offset of 2^(bits of exponent - 1), eg with 8 bit exponents it is stored by adding 2^7 = 1000 0000

Zero is stored by having an exponent of zero (and mantissa of zero).

Example 10 (decimal):

10 (decimal) = 1010 in binary → 1.010 × 10^11 (all digits binary) which is stored in single precision as:

sign = 0

exponent = 1000 0000 + 0000 0011 = 1000 00011

mantissa = 010 0000 0000 0000 0000 0000 (the 1 before the binary point is explicit).

Example -0.75 (decimal):

-0.75 decimal = -0.11 in binary (0.75 = ½ + ¼) → 1.1 × 10^-1 (all digits binary) → single precision:

sign = 1

exponent = 1000 0000 + (-0000 0001) = 0111 1111

mantissa = 100 0000 0000 0000 0000 0000

Note 0.1 in decimal is a recurring binary fraction 0.1 (decimal) = 0.0001100110011... in binary which is one reason floating point numbers have rounding issues when dealing with decimal fractions.

Wiki User

2017-06-17 06:37:50
Study guides

20 cards

## A number a power of a variable or a product of the two is a monomial while a polynomial is the of monomials

➡️
See all cards
3.8
1775 Reviews

Wiki User

2017-06-16 01:28:39

The numbers stored in a computer in floating-point notation are stored in scientific notation - but note that internally, they are stored in base-2, not in base-10.