CSC152, Class 40: Dictionaries (3): Hash Tables Admin: * REMINDER: Attendance is expected, particularly on Tuesdays * Women's soccer at 1 * Cool CS talk at 4:30; Food at 4:15 * Convo on incarceration tomorrow at 11:00 * Homework available this evening (described in class) Overview: * Rethinking our dictionary implementation * A solution: Hash tables * Hash functions * An exercise * Hash tables in java * Handling collisions * Handling removal /Dictionaries, Revisited/ * Philosophy: * Kyle: "You have keys and values; you put them in in pairs and you can quickly search by key" * Gaurav: "A collection of objects (values) indexed by other objects (keys)" * Sam: "Like arrays, but indexed by objects rather than integers" * Applications: * Mapping names to objects in World * Mapping attributes to values in Things * Methods * set(Object key, Object value) * get(Object key) // find(Object key) * remove(Object key) * Implementation * Binary search trees with good luck * set: O(logn) * find: O(logn) ( O(n) for bad luck ) * remove: O(logn) * Sorted array * set: O(n) * find: O(logn) * remove: O(n) * Unsorted array * set: O(n) * find: O(n) * remove: O(n) /Data Structure: Hash Table/ * Observation: If we want to be as quick as arrays, we have to take advantage of the strengths of arrays * Hence, * convert keys to numbers, * use that number as the index in an underlying array * Problems: * How do you convert keys to numbers? * Typically, there are more possible keys than integers * Typically, there are more integers than valid indices * Converting keys to numbers in Java: Use the hashCode() method /Exercise/ * A function to convert names (sequences of letters) to integers: * Convert each letter to an integer and sum the values * Compute the remainder after dividing by the size of the underlying array (contents) (20) A: 1 F: 6 K: 11 P: 16 U: 21 Z: 26 B: 2 G: 7 L: 12 Q: 17 V: 22 C: 3 H: 8 M: 13 R: 18 W: 23 D: 4 I: 9 N: 14 S: 19 X: 24 E: 5 J: 10 O: 15 T: 20 Y: 25 E.g. SAMR = 19 + 1 + 13 + 18 = 51; Remainder: 11 contents[00] contents[01]: (Kevin,Conor) contents[02]: (Eryn,O'Neil) contents[03] contents[04] contents[05]: (Brett,McMillian), (Brad,IDontRemember) contents[06] contents[07] contents[08] contents[09] contents[10]: (Gaurav,Gupta) contents[11]: (SamR,ebelsky) contents[12]: (Mars,Lander) [shifted from 11] contents[13]: (Kyle,G) contents[14]: (Arms,Race) [shifted from 11] contents[15]: (Angeline,Namai), (Leigh Ahlgren), (Eric Becking) contents[16]: contents[17]: (Louisa,P) contents[18]: (Mike,C) contents[19]: Unresolved problems: * Two values with the same index (conflict) * Designing hashCode * Removal * What do we do when we fill the table? /What should we do when we try to add a second value with the same index?/ * Randomly assign an "unfilled index" and notate that you've done so * "Notate" is hard * Assign the next "unfilled index" * Two step search * Find the index * Look at every subsequent cell until (a) you find a matching key or - return value (b) you find an empty cell - throw NotFound * Potentially slows it down "a whole lot" * If about half the cells in the table are empty, it is unlikely to slow it down very much * Removal is probably a pain * Step through every subsequent place until you hit null and, if the element is out of place, shift it into the open place * Store a list of key/value pairs in each element of the array YOUR HOMEWORK: Implement the "unfilled index" strategy in what is now rebelsky.dict.HashTable /Detour: Shallow vs. Deep Copying/ * Example: Tree