CSC161 2010F, Class 44: Hash Tables

Overview:
* Questions on HW 8
* ADTs and Data Structures.
* The Dictionary ADT.
* Implementing Dictionaries with Association Lists.
* Hash Tables.

Admin:
* There is no reading for Friday!
* EC for tomorrow's Thursday Extra: Experimental Algorithmics.
* EC for some part of the Rosenfield Corn Symposium.
* I will be unavailable after 1:45 today.

Questions on HW8

* Do we have to sort different types?
  * No.  You're just sorting arrays of strings.

* Then what does the -n mean?
  * It means that you parse the strings when comparing them
  int 
  numcomp (const char *s1, const char *s2)
  {
    float f1 = atof (s1);
    float f2 = atof (s2);

    if (f1 < f2)
      return -1;
    else if (f1 > f2)
      return 1;
    else
      return 0;
   }
  }

* How do I build unsort
  int
  rancomp (const char *s1, const char *s2)
  {
    int thingy = rand () % 5;
    if (thingy < 2)             // 0 1
      return -1;
    else if (thingy < 4)        // 2 3
      return 1;
    else                        // 4
      return 0;
  }

  * Doesn't that violate the normal policies of a comparator?
    * Yes, but so what.

  * Why do you have parameters that you ignore?
    * Because it needs to have the same form as strcmp

* What parameters should I give to my sort routine?
  * width
  * lines
  * comparison function
  * maybe a selection function (to deal with -k)
  * which "field" to select (-1 for "whole string")
  * ...

* Can we adapt the code you posted?
  * Yes, provided your citation practices are better than mine.

* Did you email it to us?
  * Yes.

* Can you email it to us again?
  * I can, but I won't.

* Was that supposed to be funny?
  * Well, some of you laughed.  Maybe it's just the cadence of my
    typing that's funny.

* I remember hearing you say something like "It's always a good idea to
  decompose the problem."  How would you decompose this problem?
  * Oooh!  We can do this together in class.
  * Can use the "handle each parameter" as a decomposition method.  That's not quite it.
  * Four main tasks in the problem
    * Parse the command line to set up the "context"
      * What "file" to use (stdin or a file)
      * What comparator to use
      * Whether or not to separate things
      * ...
    * Do the real work
      * Read all the lines
        * Read one line
        * Copy into array
        * Do it repeatedly
      * Sort the lines using the comparator
      * Print the output

* When given a file, should we change that file?
  * No, unless you support the -o option and they duplcate the file name.

* Will we get a HW9?
  * Of course!  I'm already thinking about it.

* Will we get a HW10?
  * No.

* Can Exam 3 be due Monday of Hell Week?
  * yes
 
* Can you distribute it on Wednesday before Thanksgiving so that 
  certain soccer players can try to finish it by Friday of Purgatory
  Week?
  * Yes

----

What are abstract data types and data structures?
* Ways of organizing information
* The way we organize data affects the kinds of algorithms we write
* Using a list vs. using an array change how you think of your work

Computer Scientists spend a lot of time thinking about ways to
organize data
* Data structure: Mid-level descriptions of how to organize the
  data
* Mor generally: What do I want to do with the data?  ADT

Um, Dictionary/Map/Associative Array/Hash/Table, Set, List, Stack,
Queue, Binary Tree, Priority Queue

Today's focus: The Dictionary: A collection of key/value pairs
* Two operations: Add a key/value pair; Find the value for a key
* Traditionally, key and value are strings
* Like an array, but indexed by strings, rather than by numbers.
  The key is the index
* A great "hammer"

How do we implement dictionaries?
* We built them in 151: Lists of key/value pairs
* Add: Cons on to the front of the list
* Find: 
(define lookup 
  (lambda (dict key)
    (if (null? dict)
        boom-crash-and-die
        (if (equal? key (caar dict))
            (car dict)
            (lookup (cdr dict) key)))))a
* SLOW and INCONSISTENT

How else do we implement dictionaries?
* Trade complexity for speed
* Hash tables are popular
  * Basic idea: Arrays are fast, but they are not indexed by
    strings.  I can ask for a[5] but not a["Joe"]
  * Strategy: Turn strings into integers that can serve as indices
    * How?  Using a function we design called a "hash function"
    * Two goals of a hash function:
      * Identical strings must hash to the same value
      * Two different strings should hash to different values
* A bad hash function for strings:
  * Sum the ASCII values in the string and then mod by the table
    size
    * Permutations hash to the same place
    * Most N letter words will hash to about the same place