CSC161 2010F, Class 44: Hash Tables Overview: * Questions on HW 8 * ADTs and Data Structures. * The Dictionary ADT. * Implementing Dictionaries with Association Lists. * Hash Tables. Admin: * There is no reading for Friday! * EC for tomorrow's Thursday Extra: Experimental Algorithmics. * EC for some part of the Rosenfield Corn Symposium. * I will be unavailable after 1:45 today. Questions on HW8 * Do we have to sort different types? * No. You're just sorting arrays of strings. * Then what does the -n mean? * It means that you parse the strings when comparing them int numcomp (const char *s1, const char *s2) { float f1 = atof (s1); float f2 = atof (s2); if (f1 < f2) return -1; else if (f1 > f2) return 1; else return 0; } } * How do I build unsort int rancomp (const char *s1, const char *s2) { int thingy = rand () % 5; if (thingy < 2) // 0 1 return -1; else if (thingy < 4) // 2 3 return 1; else // 4 return 0; } * Doesn't that violate the normal policies of a comparator? * Yes, but so what. * Why do you have parameters that you ignore? * Because it needs to have the same form as strcmp * What parameters should I give to my sort routine? * width * lines * comparison function * maybe a selection function (to deal with -k) * which "field" to select (-1 for "whole string") * ... * Can we adapt the code you posted? * Yes, provided your citation practices are better than mine. * Did you email it to us? * Yes. * Can you email it to us again? * I can, but I won't. * Was that supposed to be funny? * Well, some of you laughed. Maybe it's just the cadence of my typing that's funny. * I remember hearing you say something like "It's always a good idea to decompose the problem." How would you decompose this problem? * Oooh! We can do this together in class. * Can use the "handle each parameter" as a decomposition method. That's not quite it. * Four main tasks in the problem * Parse the command line to set up the "context" * What "file" to use (stdin or a file) * What comparator to use * Whether or not to separate things * ... * Do the real work * Read all the lines * Read one line * Copy into array * Do it repeatedly * Sort the lines using the comparator * Print the output * When given a file, should we change that file? * No, unless you support the -o option and they duplcate the file name. * Will we get a HW9? * Of course! I'm already thinking about it. * Will we get a HW10? * No. * Can Exam 3 be due Monday of Hell Week? * yes * Can you distribute it on Wednesday before Thanksgiving so that certain soccer players can try to finish it by Friday of Purgatory Week? * Yes ---- What are abstract data types and data structures? * Ways of organizing information * The way we organize data affects the kinds of algorithms we write * Using a list vs. using an array change how you think of your work Computer Scientists spend a lot of time thinking about ways to organize data * Data structure: Mid-level descriptions of how to organize the data * Mor generally: What do I want to do with the data? ADT Um, Dictionary/Map/Associative Array/Hash/Table, Set, List, Stack, Queue, Binary Tree, Priority Queue Today's focus: The Dictionary: A collection of key/value pairs * Two operations: Add a key/value pair; Find the value for a key * Traditionally, key and value are strings * Like an array, but indexed by strings, rather than by numbers. The key is the index * A great "hammer" How do we implement dictionaries? * We built them in 151: Lists of key/value pairs * Add: Cons on to the front of the list * Find: (define lookup (lambda (dict key) (if (null? dict) boom-crash-and-die (if (equal? key (caar dict)) (car dict) (lookup (cdr dict) key)))))a * SLOW and INCONSISTENT How else do we implement dictionaries? * Trade complexity for speed * Hash tables are popular * Basic idea: Arrays are fast, but they are not indexed by strings. I can ask for a[5] but not a["Joe"] * Strategy: Turn strings into integers that can serve as indices * How? Using a function we design called a "hash function" * Two goals of a hash function: * Identical strings must hash to the same value * Two different strings should hash to different values * A bad hash function for strings: * Sum the ASCII values in the string and then mod by the table size * Permutations hash to the same place * Most N letter words will hash to about the same place