CSC161 2010F, Class 13: IEEE Floating Point Representation
Overview:
* C's bitwise operations.
* Religious wars: Big-endian vs. Little endian representation>.
* The problem of real numbers.
* One approach: Rationals.
* Another approach: Fixed-precision.
* Detour: Scientific notation.
* The IEEE floating point standard.
Admin:
* Assignment 4 is now ready.
* What did you think about the experience of doing assignment 3?
* Danger! Class 13 falls on Friday!
How do I build a tarball from a directory?
tar cvf sam-hw4.tar sam-hw4
That is
tar cvf TARFILE DIRECTORY
C's bitwise operations.
* &, |, ~
* & is "and", in a bitwise fashion
* 0 and 0 => 0
* 0 and 1 => 0
* 1 and 0 => 0
* 1 and 1 => 1
* Can you ever get different truth values by using & vs &&?
If we && two true values, we always get true
Are there two true values we can & and get false (0)?
Certainly, 2 & 1 is 0
* | is bitwise "or"
* ~ is bitwise "not"
* Why would we ever want this operations?
* Converting from lowercase to uppercase involves flipping one bit
* Concision
* You can fit a lot of information in an integer.
Other cool bitwise operations
* << - shift the bits to the right
x = x << 3 // Multiply x by 8 the fast way
* >> - shift the bits to the left
x = x >> 3 // Divide x by 8 the fast way
* And variants
Religious wars: Big-endian vs. Little endian representation>.
The problem of real numbers.
What might you ask about a proposed representation?
* What range of numbers is possible?
* How does it work?
* Is it easy for humans to understand
* Is it easy to write circuits to process?
* Are those circuits fast and small?
* How confusing is it to represent 0? (E.g., signed magnitude had
two zeros)
* How efficient is it?
* How does it deal with approximation?
One approach: Rationals.
* We use one integer as the numerator and one as the denominator
* Pretty easy to understand
* Largest is 2^(N/2)
* Smallest positive is 2^(-N/2)
* If we put a zero in the denominator, it clearly means "NaN" (or
maybe infinity)
* Incredibly wasteful: Is 0.5 1/2 or 2/4 or 4/8 or 8/16 or 3/6 ...
* Probably easy to compute with: Just use something like the
existing circuitry
Another approach: Fixed-precision.
* Choose a denominator, and you just represent the numerator
* No duplication
* Relatively understandable
* The accuracy is independent of the size of the number
Detour: Scientific notation.
* But that's not what we normally do when do math
* We usually have the significand bits (the significand) and a
corresponding exponent (the exponent), along with a sign
* We can try to match this representation in the computer
1 bit for sign
M bits for the significand #.######
N-(M+1) bits for the exponent 2^X
* Cool IEEE factoid
the leftmost digit in the significisaanadmant is not represented,
and is almost always represented as 1.
The IEEE floating point standard.