Skip to main content

CSC 301.01, Class 09: Hash Tables

Overview

  • Preliminaries
    • Notes and news
    • Upcoming work
    • Extra credit
    • Questions
  • Review of hash tables.
  • Hash functions, revisited.
  • Other uses of hash tables and hash functions.

News / Etc.

  • Today’s class may be sketchier than normal because I lost thirty minutes of prep/reflection time due to fire alarms.

Upcoming work

  • Assignment 3, due 10:30 p.m. TONIGHT
    • Code via email
    • Printed under door
  • Assignment 4, due 10:30 p.m. Next Wednesday
    • Implement hash tables in Scheme.
    • Reflect on how to implement sets.

Extra credit (Academic)

  • CS Extras, Thursday, Klinge Map Group on Cauldron
  • CS Table, Tuesday, ???

Extra credit (Peer)

???

Extra Credit (Misc)

???

Other good things

Questions

Loop invariants help you a whole lot in writing partition correctly. What
is a loop invariant.
It is a way of thinking about the state of the system.
Usually with arrays.
If the invariant holds at the start of one iteration of the loop, it still holds at the end of that iteration.
It provides useful information about what our loop accomplishes.
What should we track?
How about array references.
Can we assume that n is a power of whatever in the inductive proofs?
If you must.
But you could also try doing so without that assumption.
What’s the difference between strong and weak induction.
Weak: If it works for n-1, it should work for n
Strong: If it works for <= n-1, it should work for n
What do you want us to do for part b?
Figure out which of the three patterns is at play. Explain why. Use that pattern.
Will you force us to argue regularity?
It would be nice, but no.

Review of hash tables

What are the key ideas of hash tables?

  • We have pairs of keys and values (dictionaries). We want to look up values by keys.
  • We use a hash function that takes keys and returns an integer and use integer to index into an array where we store key/value pairs.
  • It gives expected constant time lookup of values by keys.

What are some design decisions we make in implementing hash tables?

  • Sometimes two keys end up with the same place in the array, particulary when we mod by the size of the array.
    • We can put a linked list at that point in the array (chaining/bucketing)
    • We can look in a nearby cell (probing)
  • To keep the buckets small, we generally grow the underlying array when it reaches some percent of capacity.
  • It is important to have a good hash function, one that distributes keys fairly uniformly across the number space.
  • The size of the underlying array may be important.

Hash functions, revisited

What does the following hash function do?

[Borrowed from Skienna p. 89]

#define alpha SOME_LARGE_PRIME
int hash(char *s)
{
  int len = strlen(s);
  int code = 0;
  for (int i = 0; i < len; i++)
    {
      code += s[i] * expt(alpha, len-(i+1))
    } // for
  return code;
} // hash

Suppose we have a really long string. What the difference between hash(substring(str, 0, k)) and hash(substring(str, 1, k+1))? E.g., hash(substring(str, 0, 6) vs hash(substring(str, 1, 7))

  • subtract s0*alpha^5
  • multiply by alpha
  • add s6

Other uses of hash tables and hash functions

Ideas stolen from Skiena

How could you use hash functions or tables to help you …

  • Detect plagiarism
  • Determine if string a is a substring of string b?