CSC161 2010F, Class 13: IEEE Floating Point Representation

Overview:
* C's bitwise operations.
* Religious wars: Big-endian vs. Little endian representation>.
* The problem of real numbers.
* One approach: Rationals.
* Another approach: Fixed-precision.
* Detour: Scientific notation.
* The IEEE floating point standard.

Admin:
* Assignment 4 is now ready.
* What did you think about the experience of doing assignment 3?
* Danger!  Class 13 falls on Friday!

How do I build a tarball from a directory?

        tar cvf sam-hw4.tar sam-hw4

That is

        tar cvf TARFILE DIRECTORY


C's bitwise operations.
* &, |, ~
  * & is "and", in a bitwise fashion
    * 0 and 0 => 0
    * 0 and 1 => 0
    * 1 and 0 => 0
    * 1 and 1 => 1
  * Can you ever get different truth values by using & vs &&?
    If we && two true values, we always get true
    Are there two true values we can & and get false (0)?
    Certainly, 2 & 1 is 0
  * | is bitwise "or"
  * ~ is bitwise "not"
* Why would we ever want this operations?
  * Converting from lowercase to uppercase involves flipping one bit
  * Concision
  * You can fit a lot of information in an integer.

Other cool bitwise operations
* << - shift the bits to the right
  x = x << 3  // Multiply x by 8 the fast way
* >> - shift the bits to the left
  x = x >> 3  // Divide x by 8 the fast way
* And variants

Religious wars: Big-endian vs. Little endian representation>.

The problem of real numbers.
What might you ask about a proposed representation?
* What range of numbers is possible?
* How does it work? 
  * Is it easy for humans to understand
  * Is it easy to write circuits to process?
  * Are those circuits fast and small?
* How confusing is it to represent 0?  (E.g., signed magnitude had
  two zeros)
* How efficient is it?
* How does it deal with approximation?

One approach: Rationals.
* We use one integer as the numerator and one as the denominator
  * Pretty easy to understand
  * Largest is 2^(N/2)
  * Smallest positive is 2^(-N/2)
  * If we put a zero in the denominator, it clearly means "NaN" (or 
    maybe infinity)
  * Incredibly wasteful: Is 0.5 1/2 or 2/4 or 4/8 or 8/16 or 3/6 ...
  * Probably easy to compute with: Just use something like the
    existing circuitry

Another approach: Fixed-precision.
* Choose a denominator, and you just represent the numerator
  * No duplication
  * Relatively understandable
  * The accuracy is independent of the size of the number

Detour: Scientific notation.
* But that's not what we normally do when do math 
* We usually have the significand bits (the significand) and a
  corresponding exponent (the exponent), along with a sign
* We can try to match this representation in the computer
  1 bit for sign 
  M bits for the significand #.######
  N-(M+1) bits for the exponent  2^X
* Cool IEEE factoid
  the leftmost digit in the significisaanadmant is not represented,
  and is almost always represented as 1.
The IEEE floating point standard.