CSC152 2005S, Class 40: Dictionaries (4): Hash Tables Admin: * New disgusting candy: Gummy Crabby Patties * Exams due NOW! * Forthcoming extra credit * Saturday: 5K * Registration 8:30-9:30 in Merrill park, 12th and Park * Thursday: "Convo" (Woods Hole) * Thursday Week: "Convo" (Darwin) * Fun with Prospies! * Chaya from Washington, D.C. * What are the effects of knowing Elizabeth for several years? * Therapy * Where else do you plan to apply? * Goucher * William and Mary * Mary Washington * If you come to Grinnell, you'll lose weight because the food is not as good as at Mary Washington * Chaya should come to Grinnell b/c it's further from home Overview: * Analyzing Dictionary Implementations * An Alternate: Hash tables * Hash functions * Java details Review: * What is a dictionary? * Purspose/Philosophy: Like an array: It stores things using indices that are strings (or other objects) * New idea: The indices are traditionally called "keys" * Methods * put(Object key, Object value) * get(Object key) * remove(Object key) // Not in Sam's original, but in the lab * Applications * Implementations * Pair of parallel arrays (keys in first, values in second) * Works even if we don't have comparable keys (only need equals) * put takes O(1) // Just put it at the end of the array WRONG (but it was a good guess, nonetheless) * put takes O(n) // Need to check to see if it's already there * get takes O(n) // Check each key in turn * Pair of parallel arrays, sorted by key in increasing order * Requires comparable keys * get takes O(log_2(n)) // Use binary search * put takes O(n) // You can find the place in O(log_2(n)), but it takes O(n) to shift stuff over to make room for the new thing Key: Aaron Chaipan Chaya Monica Saugar Value: Bernstein Khannabha FromDC Ugwi Sainju * Binary search trees * Requires comparable keys * get takes O(depth) --Just traverse the tree * put takes O(depth) -- And I'm not sure why, either * Incorrect claim: the depth of the tree is in O(log_2(n)) * Correct claim: the depth of the tree is in O(n) * It would be nice if we could make the binary search tree nearly complete * If you check the CS literature: No one has done it, yet * However, you can still get O(log_2(n)) depth for "mostly balanced" using some clever tricks that you SHOULD learn in CSC301 (unless you're Cassie) Problems with these implementations: * Not much better than O(n) * A few O(log_2(n)) * Often require comparable objects The clever solution: Hash tables Basic idea: Since arrays are so efficient, convert your key to an integer and use arrays *as arrays* Put the key/value pair (key,value) in the array hashtable at position hashtable[hashFunction(key)] The hard part: finding the hash function that converts arbitrary objects to numbers Example (using students as keys): * Hash function: Length of first name * Hash function: Student id number Example (using names (strings) as keys): * Hash function: Length of first name * Hash function: assign each letter a numerical value and sum the values in the name 0: 1: 2: 3: 4: 5: (Aaron,Bernstein),(David,D'Angelo) 6: (Monica,Ugwi),(Saugar,Sainju),(Cassie,Schmitz) 7: 8: (C,Khannabha) 9: 10: 11: 12: 13: Problem: A bad hash function puts lots of things in the same place in the array Note: A good function is unlikely to put two things in the same place Larger arrays decrease the chance of conflicts When we put two things in the same place, we use a sequence of nodes to store them To put something into the hash table (when we're lucky and have no conflicts) O(hashfunction) + O(putting into an array) O(1) + O(1) To get something from the hash table O(hashfunction) + O(getting from an array) O(1) + O(1)