Scientific notation allows us to represent large and small fractions using a compact notation:
Avogadro's Number = A = 6.023 x 1023 = 602, 300, 000,
000, 000, 000, 000, 000 = M x BE
Planck's constant = 6.626068 x 10-34 = .00000000000000000000000006626068 = M x BE
where:
M = Mantissa
B = Base
E = Exponent
Notice that the representation isn't unique. For example:
A = 60.23 x 1022
When we specify that there is only one digit to the left of the decimal point this is called normalized scientific notation.
In general, any number can be written as a power of 10, but where negative exponents are allowed:
6.023 = 6 x 100 + 0 x 10-1 + 2 x 10-2 + 3 x 10-3
Base 2 scientific notation follows the same pattern, where we note that adding a 0 (shifting left) means multiplying by 2 and removing a 0 (shifting right) means dividing by 2.
[42]2 = 101010.0000 = 1.0101 x 25
[21]2 = 1.0101 x 24
[10.5]2 = 1.0101 x 23
[5.25]2 = 1.0101 x 22
[2.625]2 = 1.0101 x 21
[1.3125]2 = 1.0101 x 20
[0.65625]2 = 1.0101 x 2-1
32 = 1.00000 x 25
16 = 1.00000 x 24
8 = 1.00000 x 23
4 = 1.00000 x 22
2 = 1.00000 x 21
1 = 1.00000 x 20
.5 = 1.00000 x 2-1
.25 = 1.00000 x 2-2
The Java virtual machine has to floating point types: float and double.
Java floats are represented using the 32 bit IEEE 754-1985 floating point standard:
Where:
sign = 1 bit = 0 (positive) or 1 (negative)
Exponent = 8 bit biased integer = actual exponent + 127
Mantissa = 23 bit unsigned integer following 1.
So the conversion formula is:
[F]2 = -1sign x 1.Mantissa x 2Exponent
– 127
Java doubles are a 64 bit version of this pattern.
[.25]2 = -10 x 1.00000000000000000000000 x 2125 = 0,01111101,00000000000000000000000
[-42.0]2 = -11 x 1.01010000000000000000000 x 2132 = 1,10000100,01010000000000000000000
There are a number of special cases. For example:
[0.0]2 = 00000000000000000000000000000000
Float.NaN
Float.POSITIVE_INFINITY
Float.NEGATIVE_INFINITY
Here's a nice conversion tool:
http://www.h-schmidt.net/FloatApplet/IEEE754.html
[1/3]2 = 1/4 + [1/3 – 1/4]2 = 1/4 + [1/12]2 = 1/4 + 1/16 + [1/12 – 1/16]2 = 1/4 + 1/16 + [1/48]2 = 1/4 + 1/16 + 1/64 + ...
= 0.010101010101...
This base 2 expansion goes on forever, so we just round it off:
= -10 x 1.01010101010101010101010 x 2125 =
0,01111101,01010101010101010101010 = [0.3333333]2
These round-off errors can accumulate in a lengthy calculation.
Multiplying and dividing floats isn't too bad, but adding and subtracting can be hard.