CSC152 2005S, Class 40: Dictionaries (4): Hash Tables
Admin:
* New disgusting candy: Gummy Crabby Patties
* Exams due NOW!
* Forthcoming extra credit
* Saturday: 5K
* Registration 8:30-9:30 in Merrill park, 12th and Park
* Thursday: "Convo" (Woods Hole)
* Thursday Week: "Convo" (Darwin)
* Fun with Prospies!
* Chaya from Washington, D.C.
* What are the effects of knowing Elizabeth for several years?
* Therapy
* Where else do you plan to apply?
* Goucher
* William and Mary
* Mary Washington
* If you come to Grinnell, you'll lose weight because the food is not as good as at Mary Washington
* Chaya should come to Grinnell b/c it's further from home
Overview:
* Analyzing Dictionary Implementations
* An Alternate: Hash tables
* Hash functions
* Java details
Review:
* What is a dictionary?
* Purspose/Philosophy: Like an array: It stores things using indices that are strings (or other objects)
* New idea: The indices are traditionally called "keys"
* Methods
* put(Object key, Object value)
* get(Object key)
* remove(Object key) // Not in Sam's original, but in the lab
* Applications
* Implementations
* Pair of parallel arrays (keys in first, values in second)
* Works even if we don't have comparable keys (only need equals)
* put takes O(1) // Just put it at the end of the array WRONG (but it was a good guess, nonetheless)
* put takes O(n) // Need to check to see if it's already there
* get takes O(n) // Check each key in turn
* Pair of parallel arrays, sorted by key in increasing order
* Requires comparable keys
* get takes O(log_2(n)) // Use binary search
* put takes O(n) // You can find the place in O(log_2(n)), but it takes O(n) to shift stuff over to make room for the new thing
Key: Aaron Chaipan Chaya Monica Saugar
Value: Bernstein Khannabha FromDC Ugwi Sainju
* Binary search trees
* Requires comparable keys
* get takes O(depth) --Just traverse the tree
* put takes O(depth) -- And I'm not sure why, either
* Incorrect claim: the depth of the tree is in O(log_2(n))
* Correct claim: the depth of the tree is in O(n)
* It would be nice if we could make the binary search tree nearly complete
* If you check the CS literature: No one has done it, yet
* However, you can still get O(log_2(n)) depth for "mostly balanced" using some clever tricks that you SHOULD learn in CSC301 (unless you're Cassie)
Problems with these implementations:
* Not much better than O(n)
* A few O(log_2(n))
* Often require comparable objects
The clever solution: Hash tables
Basic idea: Since arrays are so efficient, convert your key to an integer and use arrays *as arrays*
Put the key/value pair (key,value) in the array hashtable at position
hashtable[hashFunction(key)]
The hard part: finding the hash function that converts arbitrary objects to numbers
Example (using students as keys):
* Hash function: Length of first name
* Hash function: Student id number
Example (using names (strings) as keys):
* Hash function: Length of first name
* Hash function: assign each letter a numerical value and sum the values in the name
0:
1:
2:
3:
4:
5: (Aaron,Bernstein),(David,D'Angelo)
6: (Monica,Ugwi),(Saugar,Sainju),(Cassie,Schmitz)
7:
8: (C,Khannabha)
9:
10:
11:
12:
13:
Problem: A bad hash function puts lots of things in the same
place in the array
Note: A good function is unlikely to put two things in the same place
Larger arrays decrease the chance of conflicts
When we put two things in the same place, we use a sequence of nodes to store them
To put something into the hash table (when we're lucky and have no conflicts)
O(hashfunction) + O(putting into an array)
O(1) + O(1)
To get something from the hash table
O(hashfunction) + O(getting from an array)
O(1) + O(1)