CSC152, Class 40: Dictionaries (3): Hash Tables

Admin:
* REMINDER: Attendance is expected, particularly on Tuesdays
* Women's soccer at 1
* Cool CS talk at 4:30; Food at 4:15
* Convo on incarceration tomorrow at 11:00
* Homework available this evening (described in class)

Overview:
* Rethinking our dictionary implementation
* A solution: Hash tables
* Hash functions
  * An exercise
* Hash tables in java
* Handling collisions
* Handling removal

/Dictionaries, Revisited/

* Philosophy: 
  * Kyle: "You have keys and values; you put them in in pairs and you can quickly search by key"
  * Gaurav: "A collection of objects (values) indexed by other objects (keys)"
  * Sam: "Like arrays, but indexed by objects rather than integers"
* Applications:
  * Mapping names to objects in World
  * Mapping attributes to values in Things
* Methods
  * set(Object key, Object value)
  * get(Object key) // find(Object key)
  * remove(Object key)
* Implementation
  * Binary search trees with good luck
    * set: O(logn)
    * find: O(logn) ( O(n) for bad luck )
    * remove: O(logn)
  * Sorted array
    * set: O(n)
    * find: O(logn)
    * remove: O(n)
  * Unsorted array
    * set: O(n)
    * find: O(n)
    * remove: O(n)

/Data Structure: Hash Table/

* Observation: If we want to be as quick as arrays, we have to take advantage of the strengths of arrays
* Hence, 
  * convert keys to numbers,
  * use that number as the index in an underlying array
* Problems:
  * How do you convert keys to numbers?
  * Typically, there are more possible keys than integers
  * Typically, there are more integers than valid indices

* Converting keys to numbers in Java: Use the hashCode() method

/Exercise/

* A function to convert names (sequences of letters) to integers:
  * Convert each letter to an integer and sum the values
  * Compute the remainder after dividing by the size of the underlying array (contents) (20)

A: 1   F: 6   K: 11  P: 16  U: 21  Z: 26
B: 2   G: 7   L: 12  Q: 17  V: 22
C: 3   H: 8   M: 13  R: 18  W: 23
D: 4   I: 9   N: 14  S: 19  X: 24
E: 5   J: 10  O: 15  T: 20  Y: 25

E.g. SAMR = 19 + 1 + 13 + 18 = 51; Remainder: 11

contents[00]
contents[01]: (Kevin,Conor)
contents[02]: (Eryn,O'Neil)
contents[03]
contents[04]
contents[05]: (Brett,McMillian), (Brad,IDontRemember)
contents[06]
contents[07]
contents[08]
contents[09]
contents[10]: (Gaurav,Gupta)
contents[11]: (SamR,ebelsky)
contents[12]: (Mars,Lander) [shifted from 11]
contents[13]: (Kyle,G)
contents[14]: (Arms,Race) [shifted from 11]
contents[15]: (Angeline,Namai), (Leigh Ahlgren), (Eric Becking)
contents[16]:
contents[17]: (Louisa,P)
contents[18]: (Mike,C)
contents[19]: 

Unresolved problems:

* Two values with the same index (conflict)
* Designing hashCode
* Removal
* What do we do when we fill the table?

/What should we do when we try to add a second value with the same index?/

* Randomly assign an "unfilled index" and notate that you've done so
  * "Notate" is hard
* Assign the next "unfilled index"
  * Two step search
    * Find the index
    * Look at every subsequent cell until (a) you find a matching key or - return value (b) you find an empty cell - throw NotFound
  * Potentially slows it down "a whole lot"
  * If about half the cells in the table are empty, it is unlikely to slow it down very much
  * Removal is probably a pain
    * Step through every subsequent place until you hit null and, if the element is out of place, shift it into the open place
* Store a list of key/value pairs in each element of the array

YOUR HOMEWORK: Implement the "unfilled index" strategy in what is now rebelsky.dict.HashTable

/Detour: Shallow vs. Deep Copying/

* Example: Tree