Gustavo's IEEE-754 Brain Teaser

Aug 5, 2013


Back in June, Gustavo Niemeyer posted the following question on his Labix.org blog:

Assume uf is an unsigned integer with 64 bits that holds the IEEE-754 representation for a binary floating point number of that size.

How can you tell if uf represents an integer number?

I can’t talk for you, but I write business applications. I just don’t have the background to quickly knock out an answer for a question like this. Fortunately Gustavo posted the logic for the answer. I thought it would be fun to better understand the question and break down his answer. I am going to work with a 32 bit number just to make everything easier.

How does the IEEE-754 standard store a floating point number in a binary format?  To start I found these two pages:

http://steve.hollasch.net/cgindex/coding/ieeefloat.html
http://class.ece.iastate.edu/arun/CprE281_F05/ieee754/ie5.html

The IEEE-754 specification represents a floating point number in base 2 scientific notation using a special binary format. If you don’t know what I mean by base 2 then look at my post on Understanding Type In Go (http://www.goinggo.net/2013/07/understanding-type-in-go.html).

Scientific notation is an efficient way of writing very large or small numbers. It works by using a decimal format with a multiplier. Here are some examples:

Base 10 Number Scientific Notation Calculation Coefficient Base Exponent Mantissa
700 7e+2 7 * 102 710 2 0
4,900,000,000 4.9e+9 4.9 * 109 4.9 10 9 .9
5362.63 5.36263e+3 5.36263 * 103 5.36263 10 3 .36263
-0.00345 3.45e-3 3.45 * 10-3 3.45 10 -3 .45
0.085 1.36e-4 1.36 * 2-4 1.36 2 -4 .36


In normal scientific notation form there is always just one digit on the left side of the decimal point. For base 10 numbers that digit must be between 1 through 9 and for base 2 numbers that digit can only be 1.

The entire number in scientific notation is called the Coefficient. The Mantissa is all the numbers to the right of the decimal point. These terms are important so take the time to study and understand the chart above.

How we move the decimal point to that first position determines the value of the Exponent. If we have to move the decimal point to the left, the Exponent is a positive number, to the right, it is a negative number. Look at the chart above and see the Exponent value for each example.

The Base and the Exponent work together in the notation. The exponent determines the "Power Of" calculation we need to perform on the base. In the first example the number 7 is multiplied by 10 (The Base) to the power of 2 (The Exponent) to get back to the original base 10 number 700. We moved the decimal point to the left two places to convert 700 to 7.00, which made the Exponent +2 and created the notation of 7e+2.

The IEEE-754 standard does not store base 10 scientific notation numbers but base 2 scientific notation numbers. The last example in the chart above represents the base 10 number 0.085 in base 2 scientific notation. Let’s learn how that notation is calculated.

Base 10 Number Scientific Notation Calculation Coefficient Base Exponent Mantissa
0.085 1.36e-4 1.36 * 2-4 1.36 2 -4 .36

We need to divide the base 10 number (0.085) by some power of two so we get a 1 + Fraction value. What do I mean by a 1 + Fraction value? We need a number that looks like the Coefficient in the example, 1 + .36. The IEEE-754 standard requires that we have a "1." in the Coefficient. This allows us to only have to store the Mantissa and give us an extra bit of precision.

If we use brute force you will see when we finally get the 1 + Fraction value for 0.085:

0.085 / 2-1 = 0.17
0.085 / 2-2 = 0.34
0.085 / 2-3 = 0.68
0.085 / 2-4 = 1.36   ** We found the 1 + Fraction

An exponent of -4 gives us the 1 + Fraction value we need. Now we have everything we need to store the base 10 number 0.085 in IEEE-754 format.

Let’s look at how the bits are laid out in the IEEE-754 format.

Precision Sign Exponent Fraction Bits Bias
Single (32 Bits) 1 [31] 8 [30-23] 23 [22-00] 127
Double (64 Bits) 1 [63] 11 [62-52] 52 [51-00] 1023

The bits are broken into three parts. There is a bit reserved for the sign, bits for the exponent and bits that are called fraction bits. The fractions bits are where we store the mantissa as a binary fraction.

If we store the value of 0.085 using Single Precision (a 32 bit number), the bit pattern in IEEE-754 would look like this:

Sign Exponent (123) Fraction Bits (.36)
0 0111 1011 010 1110 0001 0100 0111 1011

The Sign bit, the leftmost bit, determines if the number is positive or negative. If the bit is set to 1 then the number is negative else it is positive.

The next 8 bits represent the Exponent. In our example, the base 10 number 0.085 is converted to 1.36 * 2-4 using base 2 scientific notation. Therefore the value of the exponent is -4. In order to be able to represent negative numbers, there is a Bias value. The Bias value for our 32 bit representation is 127. To represent the number -4, we need to find the number, that when subtract against the Bias, gives us -4. In our case that number is 123. If you look at the bit pattern for the Exponent you will see it represents the number 123 in binary.

The remaining 23 bits are the Fraction bits. To calculate the bit pattern for the fraction bits, we need to calculate and sum binary fractions until we get the value of the Mantissa, or a value that is as close as possible. Remember, we only need to store the Mantissa because we always assume that the "1." value exists.

To understand how binary fractions are calculated, look at the following chart. Each bit position from left to right represents a fractional value:

Binary Fraction Decimal Power
0.1 12 0.5 2-1
0.01 14 0.25 2-2
0.001 18 0.125 2-3

We need to set the correct number of fraction bits that add up to or gets us close enough to the mantissa. This is why we can sometimes lose precision.

011110 0001 0100 0111 1011 = (0.36)
Bit Value Fraction Decimal Total
2 4 14 0.25 0.25
4 16 116 0.0625 0.3125
5 32 132 0.03125 0.34375
6 64 164 0.015625 0.359375
11 2048 12048 0.00048828125 0.35986328125
13 8192 18192 0.0001220703125 0.3599853515625
17 131072 1131072 0.00000762939453 0.35999298095703
18 262144 1262144 0.00000381469727 0.3599967956543
19 524288 1524288 0.00000190734863 0.35999870300293
20 1048576 11048576 0.00000095367432 0.35999965667725
22 4194304 14194304 0.00000023841858 0.35999989509583
23 8388608 18388608 0.00000011920929 0.36000001430512

You can see that setting these 12 bits get us to the value of 0.36 plus some extra fractions.

Let’s sum up what we now know about the IEEE-754 format:

1. Any base 10 number to be stored is converted to base 2 scientific notation.
2. The base 2 scientific notation value we use must follow the 1 + Fraction format.
3. There are three distinct sections in the format.
4. The Sign bit determines if the number is positive or negative.
5. The Exponent bits represent a number that needs to be subtracted against the Bias.
6. The Fraction bits represent the Mantissa using binary fraction summation.

Let’s prove that our analysis is correct about the IEEE-754 format. We should be able to store the number 0.85 and display bit patterns and values for each section that match everything we have seen.

The following code stores the number 0.085 and displays the IEEE-754 binary representation:

package main

import (
    "fmt"
    "math"
)

func main() {
    var number float32 = 0.085

    fmt.Printf("Starting Number: %f\n\n", number)

    // Float32bits returns the IEEE 754 binary representation
    bits := math.Float32bits(number)

    binary := fmt.Sprintf("%.32b", bits)

    fmt.Printf("Bit Pattern: %s | %s %s | %s %s %s %s %s %s\n\n",
        binary[0:1],
        binary[1:5], binary[5:9],
        binary[9:12], binary[12:16], binary[16:20],
        binary[20:24], binary[24:28], binary[28:32])

    bias := 127
    sign := bits & (1 << 31)
    exponentRaw := int(bits >> 23)
    exponent := exponentRaw - bias

    var mantissa float64
    for index, bit := range binary[9:32] {
        if bit == 49 {
            position := index + 1
            bitValue := math.Pow(2, float64(position))
            fractional := 1 / bitValue

            mantissa = mantissa + fractional
        }
    }

    value := (1 + mantissa) * math.Pow(2, float64(exponent))

    fmt.Printf("Sign: %d Exponent: %d (%d) Mantissa: %f Value: %f\n\n",
        sign,
        exponentRaw,
        exponent,
        mantissa,
        value)
}

When we run the program we get the following output:

Starting Number: 0.085000

Bit Pattern: 0 | 0111 1011 | 010 1110 0001 0100 0111 1011

Sign: 0 Exponent: 123 (-4) Mantissa: 0.360000 Value: 0.085000

If you compare the displayed bit pattern with our example above, you will see that it matches. Everything we learned about IEEE-754 is true.

Now we should be able to answer Gustavo’s question. How can we tell if the value being stored is an integer? Here is a function, thanks to Gustavo’s code, that tests if the IEEE-754 stored value is an integer:

func IsInt(bits uint32, bias int) {
    exponent := int(bits >> 23) - bias - 23
    coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
    intTest := (coefficient & (1 << uint32(-exponent) - 1))

    fmt.Printf("\nExponent: %d Coefficient: %d IntTest: %d\n",
        exponent,
        coefficient,
        intTest)

    if exponent < -23 {
        fmt.Printf("NOT INTEGER\n")
        return
    }

    if exponent < 0 && intTest != 0 {
        fmt.Printf("NOT INTEGER\n")
        return
    }

    fmt.Printf("INTEGER\n")
}

So how does this function work?

Let’s start with the first condition which tests if the Exponent is less than -23. If we use the number 1 as our test number, the exponent will be 127 which is the same as the Bias. This means when we subtract the Exponent against the Bias we will get zero.

Starting Number: 1.000000

Bit Pattern: 0 | 0111 1111 | 000 0000 0000 0000 0000 0000

Sign: 0 Exponent: 127 (0) Mantissa: 0.000000 Value: 1.000000


Exponent: -23 Coefficient: 8388608 IntTest: 0
INTEGER

The test function adds an extra subtraction of 23, which represents the starting bit position for the Exponent in the IEEE-754 format. That is why you see -23 for the Exponent value coming from the test function.

Precision Sign Exponent Fraction Bits Bias
Single (32 Bits) 1 [31] 8 [30-23] 23 [22-00] 127

This subtraction is required to help with the second test. So any value that is less than -23 must be less than one (1) and therefore not an integer.

To understand how the second test works, let’s use an integer value. This time we will set the number to 234523 in the code and run the program again.

Starting Number: 234523.000000

Bit Pattern: 0 | 1001 0000 | 110 0101 0000 0110 1100 0000

Sign: 0 Exponent: 144 (17) Mantissa: 0.789268 Value: 234523.000000


Exponent: -6 Coefficient: 15009472 IntTest: 0
INTEGER

The second test looks for two conditions to identify if the number is not an integer. This requires the use of bitwise mathematics. Let’s look at the math we are performing in the function:

    exponent := int(bits >> 23) - bias - 23
    coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
    intTest := coefficient & ((1 << uint32(-exponent)) - 1)

The coefficient calculation is adding the 1 + to the Mantissa so we have the base 2 Coefficient value.

When we look at the first part of the coefficient calculation we see the following bit patterns:

coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)

Bits:                   01001000011001010000011011000000
(1 << 23) - 1:          00000000011111111111111111111111
bits & ((1 << 23) - 1): 00000000011001010000011011000000

The first part of the coefficient calculation removes the bits for the Sign and Exponent from the entire IEEE-754 bit pattern.

The second part of the coefficient calculation adds the "1 +" into the binary bit pattern:

coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)

bits & ((1 << 23) - 1): 00000000011001010000011011000000
(1 << 23):              00000000100000000000000000000000
coefficient:            00000000111001010000011011000000

Now that the coefficient bit pattern is set, we can calculate the intTest value:

exponent := int(bits >> 23) - bias - 23
intTest := (coefficient & ((1 << uint32(-exponent)) - 1))

exponent:                     (144 - 127 - 23) = -6
1 << uint32(-exponent):       000000
(1 << uint32(-exponent)) - 1: 111111

coefficient:                 00000000111001010000011011000000
1 << uint32(-exponent)) - 1: 00000000000000000000000000111111
intTest:                     00000000000000000000000000000000

The value of the exponent we calculate in the test function is used to determine the number of bits we will compare against the Coefficient. In this case the exponent value is -6. That is calculated by subtracting the stored Exponent value (144) against the Bias (127) and then against the starting bit position for the Exponent (23). This gives us a bit pattern of 6 ones (1’s). The final operation takes those 6 bits and AND’s them against the rightmost 6 bits of the Coefficient to get the intTest value.

The second test is looking for an exponent value that is less than zero (0) and an intTest value that is NOT zero (0). This would indicate the number being stored is not an integer. In our example with 234523, the Exponent is less than zero (0), but the value of intTest is zero (0). We have an integer.

I have included the sample code in the Go playground so you can play with it.

http://play.golang.org/p/3xraud43pi

If it wasn’t for Gustavo’s code I could never have identified the solution. Here is a link to his solution:

http://bazaar.launchpad.net/~niemeyer/strepr/trunk/view/6/strepr.go#L160

Here is a copy of the code that you can copy and paste:

package main

import (
    "fmt"
    "math"
)

func main() {
    var number float32 = 234523

    fmt.Printf("Starting Number: %f\n\n", number)

    // Float32bits returns the IEEE 754 binary representation
    bits := math.Float32bits(number)

    binary := fmt.Sprintf("%.32b", bits)

    fmt.Printf("Bit Pattern: %s | %s %s | %s %s %s %s %s %s\n\n",
        binary[0:1],
        binary[1:5], binary[5:9],
        binary[9:12], binary[12:16], binary[16:20],
        binary[20:24], binary[24:28], binary[28:32])

    bias := 127
    sign := bits & (1 << 31)
    exponentRaw := int(bits >> 23)
    exponent := exponentRaw - bias

    var mantissa float64
    for index, bit := range binary[9:32] {
        if bit == 49 {
            position := index + 1
            bitValue := math.Pow(2, float64(position))
            fractional := 1 / bitValue

            mantissa = mantissa + fractional
        }
    }

    value := (1 + mantissa) * math.Pow(2, float64(exponent))

    fmt.Printf("Sign: %d Exponent: %d (%d) Mantissa: %f Value: %f\n\n",
        sign,
        exponentRaw,
        exponent,
        mantissa,
        value)

    IsInt(bits, bias)
}

func IsInt(bits uint32, bias int) {
    exponent := int(bits>>23) - bias - 23
    coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
    intTest := coefficient & ((1 << uint32(-exponent)) - 1)

    ShowBits(bits, bias, exponent)

    fmt.Printf("\nExp: %d Frac: %d IntTest: %d\n",
        exponent,
        coefficient,
        intTest)

    if exponent < -23 {
        fmt.Printf("NOT INTEGER\n")
        return
    }

    if exponent < 0 && intTest != 0 {
        fmt.Printf("NOT INTEGER\n")
        return
    }

    fmt.Printf("INTEGER\n")
}

func ShowBits(bits uint32, bias int, exponent int) {
    value := (1 << 23) - 1
    value2 := (bits & ((1 << 23) - 1))
    value3 := (1 << 23)
    coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)

    fmt.Printf("Bits:\t\t\t%.32b\n", bits)
    fmt.Printf("(1 << 23) - 1:\t\t%.32b\n", value)
    fmt.Printf("bits & ((1 << 23) - 1):\t\t%.32b\n\n", value2)

    fmt.Printf("bits & ((1 << 23) - 1):\t\t%.32b\n", value2)
    fmt.Printf("(1 << 23):\t\t\t%.32b\n", value3)
    fmt.Printf("coefficient:\t\t\t%.32b\n\n", coefficient)

    value5 := 1 << uint32(-exponent)
    value6 := (1 << uint32(-exponent)) - 1
    inTest := (coefficient & ((1 << uint32(-exponent)) - 1))

    fmt.Printf("1 << uint32(-exponent):\t\t%.32b\n", value5)
    fmt.Printf("(1 << uint32(-exponent)) - 1:\t%.32b\n\n", value6)

    fmt.Printf("coefficient:\t\t\t%.32b\n", coefficient)
    fmt.Printf("(1 << uint32(-exponent)) - 1:\t%.32b\n", value6)
    fmt.Printf("intTest:\t\t\t%.32b\n", inTest)
}

I want to thank Gustavo for posting the question and giving me something to really fight through to understand.


Ultimate Go Programming LiveLessons

Ultimate Go Programming LiveLessons provides an intensive, comprehensive, and idiomatic view of the Go programming language. This course focuses on both the specification and implementation of the language, including topics ranging from language syntax, design, and guidelines to concurrency, testing, and profiling. This class is perfect for anyone who wants a jump-start in learning Go or wants a more thorough understanding of the language and its internals.

Learn more

Go Training

We have taught Go to thousands of developers all around the world since 2014. There is no other company that has been doing it longer and our material has proven to help jump start developers 6 to 12 months ahead of their knowledge of Go. We know what knowledge developers need in order to be productive and efficient when writing software in Go.

Our Go, Web and Data Science classes are perfect for both experienced and beginning engineers. We start every class from the beginning and get very detailed about the internals, mechanics, specification, guidelines, best practices and design philosophies. We cover a lot about "if performance matters" with a focus on mechanical sympathy, data oriented design, decoupling and writing production software.

Learn More

To learn about Corporate training events, options and special pricing please contact:

William Kennedy
ArdanLabs (www.ardanlabs.com)
bill@ardanlabs.com