CSC152 2005S, Class 40: Dictionaries (4): Hash Tables

Admin:
* New disgusting candy: Gummy Crabby Patties
* Exams due NOW!
* Forthcoming extra credit
  * Saturday: 5K
    * Registration 8:30-9:30 in Merrill park, 12th and Park
  * Thursday: "Convo" (Woods Hole)
  * Thursday Week: "Convo" (Darwin)
* Fun with Prospies!
  * Chaya from Washington, D.C.
    * What are the effects of knowing Elizabeth for several years?
      * Therapy
    * Where else do you plan to apply?
      * Goucher
      * William and Mary
      * Mary Washington
        * If you come to Grinnell, you'll lose weight because the food is not as good as at Mary Washington
    * Chaya should come to Grinnell b/c it's further from home

Overview:
* Analyzing Dictionary Implementations
* An Alternate: Hash tables
* Hash functions
* Java details

Review:
* What is a dictionary?
* Purspose/Philosophy: Like an array: It stores things using indices that are strings (or other objects)
  * New idea: The indices are traditionally called "keys"
* Methods
  * put(Object key, Object value)
  * get(Object key)
  * remove(Object key) // Not in Sam's original, but in the lab
* Applications
* Implementations
  * Pair of parallel arrays (keys in first, values in second)
    * Works even if we don't have comparable keys (only need equals)
    * put takes O(1) // Just put it at the end of the array WRONG (but it was a good guess, nonetheless)
    * put takes O(n) // Need to check to see if it's already there
    * get takes O(n) // Check each key in turn
  * Pair of parallel arrays, sorted by key in increasing order
    * Requires comparable keys
    * get takes O(log_2(n)) // Use binary search
    * put takes O(n) // You can find the place in O(log_2(n)), but it takes O(n) to shift stuff over to make room for the new thing
    Key:   Aaron     Chaipan   Chaya  Monica Saugar
    Value: Bernstein Khannabha FromDC Ugwi   Sainju
  * Binary search trees
    * Requires comparable keys
    * get takes O(depth) --Just traverse the tree
    * put takes O(depth) -- And I'm not sure why, either
    * Incorrect claim: the depth of the tree is in O(log_2(n))
    * Correct claim: the depth of the tree is in O(n)
    * It would be nice if we could make the binary search tree nearly complete
      * If you check the CS literature: No one has done it, yet
      * However, you can still get O(log_2(n)) depth for "mostly balanced" using some clever tricks that you SHOULD learn in CSC301 (unless you're Cassie)

Problems with these implementations:
* Not much better than O(n)
* A few O(log_2(n))
* Often require comparable objects

The clever solution: Hash tables
Basic idea: Since arrays are so efficient, convert your key to an integer and use arrays *as arrays*

Put the key/value pair (key,value) in the array hashtable at position
  hashtable[hashFunction(key)]

The hard part: finding the hash function that converts arbitrary objects to numbers

Example (using students as keys):
* Hash function: Length of first name
* Hash function: Student id number

Example (using names (strings) as keys):
* Hash function: Length of first name
* Hash function: assign each letter a numerical value and sum the values in the name

0:
1:
2:
3:
4:
5: (Aaron,Bernstein),(David,D'Angelo)
6: (Monica,Ugwi),(Saugar,Sainju),(Cassie,Schmitz)
7:
8: (C,Khannabha)
9:
10:
11:
12:
13:

Problem: A bad hash function puts lots of things in the same
place in the array

Note: A good function is unlikely to put two things in the same place

Larger arrays decrease the chance of conflicts

When we put two things in the same place, we use a sequence of nodes to store them

To put something into the hash table (when we're lucky and have no conflicts)
O(hashfunction) + O(putting into an array)
  O(1) + O(1)
To get something from the hash table
O(hashfunction) + O(getting from an array)
  O(1) + O(1)