CSC152 2005F, Class 47: Dictionaries (3): Hash Tables
Admin:
* Yes, we have class tomorrow.
* Yes, I will distribute exam 3 tomorrow.
Overview:
* Better dictionaries - How?
* Hash tables and hash functions
* Sample hash functions
* Collision policies
* Removing values from hash tables
Context: Implement Dictionaries
* Philosophy
* Stores key and values ; keys provide the way to differentiate
between values. That is, you look up (and store) values based
on a "key".
* Applications
* Methods
* get(Key key)
* put(Key key, Value value)
* Implementation(s)
Implementation One: Randomly arrange key/value pairs in a vector
* get is O(n) -- look through the vector until we find the key, then return the corresponding value
* put is O(n) -- also have to look to make sure it's not there already
Implementation Two: Put the key/value pairs in order by key
* get is O(log_2(n)) - use binary search
* put is O(n) - might have to shift elements to insert something while keeping the vector in order
Implementation Three: Use a binary search tree
* get is O(depth)
* put is O(depth)
* If we're lucky, depth is O(log_2n)
* If we're unlucky, depth is O(n)
Even if we're lucky, the best implementation is O(log_2(n))
In arrays and vectors, get and put are O(1) [ignoring expansion]
Goal: Can we do better? (What do we have to sacrifice to do better?)
* Using strings as indicies is slow
* Using numbers as indices is fast
* What do we do?
Key idea: Use "hashing" to convert each string to a number
* More explicitly, write a function/method that converts keys
to numbers.
* The method should be unlikely to convert two different keys to the same number
* If we are lucky, two keys the user chooses never have the
same number
* Implement put with
Compute h, the hash value of the key
Store the key/value pair at index h of a Vector
* Implement get with
Compute h, the hash value of the key
Get the key/value pair at index h of a Vector
Return the value part
* Unfortunately, we're rarely lucky. The hash values of keys
*collide*. Need a collision policy.
* We also need good hash functions
When we're inserting a value, and we find that there's already a key/value pair in the "slot" in the vector, what should we do?
* We cannot change the way we compute the hash value.
* Strategy one (bucket): Put more than one thing in the same slot (make a vetor of vectors) -- need to search the vector as we did in implementation one
* Strategy two (linear probe): Look to the right until you find a free space
* For strategy two to work, you need more space in the vector than things in the vector (typical strategy: Have room for 2*n things)
* How do you get values using this strategy?
* Compute h, the hash value of the key
* Look in position h
* If the hash value matches the key, we're done
* If not, continue looking in subsequent spaces until
* Find matching key - Return corresponding value
* Find an empty space - Throw an exception
* Java uses linear probing
* Sam prefers buckets, because removal is easier
Hashing is so fundamental to Java, that Java not only comes with
a Hashtable class, but also an expectation that you write hash
functions.