Held Wednesday, April 19, 2000
Overview
Today we consider another technique for implementing dictionaries:
hash tables.
Notes
Contents
Summary
- The elusive goal: O(1) lookup and retrieval
- Using objects as numeric indices
- Hash tables
- Java's
java.util.Hashtable
class
- Surprisingly, if you're willing to sacrifice some space and increase
your constant, it is possible to build a dictionary with O(1)
get
and put
.
- How? By using an array, and numbering your keys in such a way that
- all numbers are between 0 and array.length-1
- no two keys have the same number (or at least few have the same
number).
- If there are no collisions (keys with the same number), the system is simple
- To put a value, determine the number corresponding to the key
and put it in that place of the array. This is O(1+cost of
computing that number).
- To get a value, determine the number corresponding to the key
and look in the appropriate cell. This is O(1+cost of finding that
number).
- Implementations of dictionaries using this strategy are called
hash tables.
- The function used to convert an object to a number is the
hash function.
- To better understand hash tables, we need to consider
- The hash functions we might develop.
- What to do about collisions.
- The goal in developing a hash function is to come up with a function
that is unlikely to map two objects to the same position.
- Now, this isn't possible (particularly if we have more objects than
positions).
- We'll discuss what to do about two objects mapping to
the same position later.
- Hence, we sometimes accept a situation in which the hash function
distributes the objects more or less uniformly.
- It is worth some experimentation to come up with such a function.
- In addition, we should consider the cost of computing the hash function.
We'd like something that is relatively low cost (not just constant time,
but not too many steps within that constant).
- We'd also like a function that does (or can) give us a relatively
large range of numbers, so that we can get fewer collisions by increasing
the size of the hash table.
- We might want to make the size of the table a parameter to the
hash function.
- We might strive for a hash function that uses the range of positive
integers, and mod it by the size of the table.
- What are some hash functions you might use for strings?
- Sum the ASCII values in the string
- N*first letter + M*second letter
- ...
- Hash tables are so useful that Java includes them as a standard
library class,
java.util.Hashtable
.
- Let's look over
the documentation
- Why are there three constructors?
- What methods are there other than
get
and put
?
- Where's the hash function?
- Our analysis of Hash Tables to date has been based on two simple
operations: get and put.
- What happens if we want to remove elements? This can significantly
complicate matters.
- If we've chosen the ``shift into a blank space'' technique for
resolving collisions, what do we
do when it comes time to remove elements?
- Do we shift everything back? If so, think about how far we may have
to look.
- Do we leave the thing there as a blank? We might then then remove
it later when it's convenient to do so.
- Do we do something totally different?
- Note also that there are different ways of specifying ``remove''. We
might remove the element with a particular key. We might instead remove
elements based on their value. The second is obviously a much slower
operation than the first (unless we've developed a special way to handle
that problem - see if you can think of one).