answersLogoWhite

0

Floating point numbers are stored in scientific notation using base 2 not base 10.

There are a limited number of bits so they are stored to a certain number of significant binary figures.

There are various number of bytes (bits) used to store the numbers - the bits being split between the mantissa (the number) and the exponent (the power of 10 (being in the base of the storage - in binary, 10 equals 2 in decimal) by which the mantissa is multiplied to get the binary/decimal point back to where it should be), examples:

  • Single precision (IEEE) uses 4 bytes: 8 bits for the exponent (encoding ±), 1 bit for the sign of the number and 23 bits for the number itself;
  • Double precision (IEEE) uses 8 bytes: 11 bits for the exponent, 1 bit for the sign, 52 bits for the number;
  • The Commodore PET used 5 bytes: 8 bits for the exponent, 1 bit for the sign and 31 bits for the number;
  • The Sinclair QL used 6 bytes: 12 bits for the exponent (stored in 2 bytes, 16 bits, 4 bits of which were unused), 1 bit for the sign and 31 bits for the number.

The numbers are stored normalised:

In decimal numbers the digit before the decimal point is non-zero, ie one of {1, 2, ..., 9}.

In binary numbers, the only non-zero digit is 1, so *every* floating point number in binary (except 0) has a 1 before the binary point; thus the initial 1 (before the binary point) is not stored (it is implicit).

The exponent is stored by adding an offset of 2^(bits of exponent - 1), eg with 8 bit exponents it is stored by adding 2^7 = 1000 0000

Zero is stored by having an exponent of zero (and mantissa of zero).

Example 10 (decimal):

10 (decimal) = 1010 in binary → 1.010 × 10^11 (all digits binary) which is stored in single precision as:

sign = 0

exponent = 1000 0000 + 0000 0011 = 1000 00011

mantissa = 010 0000 0000 0000 0000 0000 (the 1 before the binary point is explicit).

Example -0.75 (decimal):

-0.75 decimal = -0.11 in binary (0.75 = ½ + ¼) → 1.1 × 10^-1 (all digits binary) → single precision:

sign = 1

exponent = 1000 0000 + (-0000 0001) = 0111 1111

mantissa = 100 0000 0000 0000 0000 0000

Note 0.1 in decimal is a recurring binary fraction 0.1 (decimal) = 0.0001100110011... in binary which is one reason floating point numbers have rounding issues when dealing with decimal fractions.

User Avatar

Wiki User

7y ago

Still curious? Ask our experts.

Chat with our AI personalities

RossRoss
Every question is just a happy little opportunity.
Chat with Ross
EzraEzra
Faith is not about having all the answers, but learning to ask the right questions.
Chat with Ezra
JordanJordan
Looking for a career mentor? I've seen my fair share of shake-ups.
Chat with Jordan
More answers

The numbers stored in a computer in floating-point notation are stored in scientific notation - but note that internally, they are stored in base-2, not in base-10.

User Avatar

Wiki User

7y ago
User Avatar

Add your answer:

Earn +20 pts
Q: How is scientific notation related to the floating point representation used by computers?
Write your answer...
Submit
Still have questions?
magnify glass
imp