CSC152 2005F, Class 47: Dictionaries (3): Hash Tables Admin: * Yes, we have class tomorrow. * Yes, I will distribute exam 3 tomorrow. Overview: * Better dictionaries - How? * Hash tables and hash functions * Sample hash functions * Collision policies * Removing values from hash tables Context: Implement Dictionaries * Philosophy * Stores key and values ; keys provide the way to differentiate between values. That is, you look up (and store) values based on a "key". * Applications * Methods * get(Key key) * put(Key key, Value value) * Implementation(s) Implementation One: Randomly arrange key/value pairs in a vector * get is O(n) -- look through the vector until we find the key, then return the corresponding value * put is O(n) -- also have to look to make sure it's not there already Implementation Two: Put the key/value pairs in order by key * get is O(log_2(n)) - use binary search * put is O(n) - might have to shift elements to insert something while keeping the vector in order Implementation Three: Use a binary search tree * get is O(depth) * put is O(depth) * If we're lucky, depth is O(log_2n) * If we're unlucky, depth is O(n) Even if we're lucky, the best implementation is O(log_2(n)) In arrays and vectors, get and put are O(1) [ignoring expansion] Goal: Can we do better? (What do we have to sacrifice to do better?) * Using strings as indicies is slow * Using numbers as indices is fast * What do we do? Key idea: Use "hashing" to convert each string to a number * More explicitly, write a function/method that converts keys to numbers. * The method should be unlikely to convert two different keys to the same number * If we are lucky, two keys the user chooses never have the same number * Implement put with Compute h, the hash value of the key Store the key/value pair at index h of a Vector * Implement get with Compute h, the hash value of the key Get the key/value pair at index h of a Vector Return the value part * Unfortunately, we're rarely lucky. The hash values of keys *collide*. Need a collision policy. * We also need good hash functions When we're inserting a value, and we find that there's already a key/value pair in the "slot" in the vector, what should we do? * We cannot change the way we compute the hash value. * Strategy one (bucket): Put more than one thing in the same slot (make a vetor of vectors) -- need to search the vector as we did in implementation one * Strategy two (linear probe): Look to the right until you find a free space * For strategy two to work, you need more space in the vector than things in the vector (typical strategy: Have room for 2*n things) * How do you get values using this strategy? * Compute h, the hash value of the key * Look in position h * If the hash value matches the key, we're done * If not, continue looking in subsequent spaces until * Find matching key - Return corresponding value * Find an empty space - Throw an exception * Java uses linear probing * Sam prefers buckets, because removal is easier Hashing is so fundamental to Java, that Java not only comes with a Hashtable class, but also an expectation that you write hash functions.