Held Wednesday, November 29, 2000
Summary
Today we visit yet another implementation of dictionaries: hash
tables. Hash tables are interesting in that they provide
operations that are likely to have O(1) time.
Notes
- Are there any final questions on
exam 3?
- I'd hope to receive many of them today so that I can grade them
while I'm away (ho ho ho).
- Update on search.
- I'm still thinking about the last assignment. Do you really
want another assignment?
- Form of the final (in-class).
- No class Friday. I think most of you need a break.
Overview
- Dictionaries vs. arrays
- An idea: Turn objects into numbers
- Implications of idea: Hash tables
- Writing hash functions
- Hashing in Java
- As you may have noticed, although dictionaries are "a lot like arrays"
[AP] (except that you index dictionaries by object and arrays by number),
the best implementation we've seen so far for dictionaries has
O(log2n) add and get while the corresponding array operations
are O(1).
- Is there a dictionary implementation with O(1)
get
and put
?
- Surprisingly, if you're willing to sacrifice some space and increase
your constant, it is possible to build a dictionary that is likely
to have O(1)
get
and put
.
- How? Well, we know that arrays provide O(1) get and put, so use arrays.
- How do we use an array? We "number" the keys in such a way that
- all numbers are between 0 and array.length-1
- no two keys have the same number (or at least few have the same
number).
- If there are no collisions (keys with the same number), the system is simple
- To put a value, determine the number corresponding to the key
and put it in that place of the array. This is O(1+cost of
computing that number).
- To get a value, determine the number corresponding to the key
and look in the appropriate cell. This is O(1+cost of finding that
number).
- Implementations of dictionaries using this strategy are called
hash tables.
- The function used to convert an object to a number is the
hash function.
- To better understand hash tables, we need to consider
- The hash functions we might develop.
- What to do about collisions.
- The goal in developing a hash function is to come up with a function
that is unlikely to map two objects to the same position.
- Now, this isn't possible (particularly if we have more objects than
positions).
- We'll discuss what to do about two objects mapping to
the same position later.
- Hence, we sometimes accept a situation in which the hash function
distributes the objects more or less uniformly.
- It is worth some experimentation to come up with such a function.
- In addition, we should consider the cost of computing the hash function.
We'd like something that is relatively low cost (not just constant time,
but not too many steps within that constant).
- We'd also like a function that does (or can) give us a relatively
large range of numbers, so that we can get fewer collisions by increasing
the size of the hash table.
- We might want to make the size of the table a parameter to the
hash function.
- We might strive for a hash function that uses the range of positive
integers, and mod it by the size of the table.
- What are some hash functions you might use for strings?
- Sum the ASCII values in the string
- N*first letter + M*second letter
- ...
- Hash tables are so useful that Java includes them as a standard
library class,
java.util.Hashtable
.
- Let's look over
the documentation
- Why are there three constructors?
- What methods are there other than
get
and put
?
- Where's the hash function?
- Our analysis of Hash Tables to date has been based on two simple
operations: get and put.
- What happens if we want to remove elements? This can significantly
complicate matters.
- If we've chosen the ``shift into a blank space'' technique for
resolving collisions, what do we
do when it comes time to remove elements?
- Do we shift everything back? If so, think about how far we may have
to look.
- Do we leave the thing there as a blank? We might then then remove
it later when it's convenient to do so.
- Do we do something totally different?
- Note also that there are different ways of specifying ``remove''. We
might remove the element with a particular key. We might instead remove
elements based on their value. The second is obviously a much slower
operation than the first (unless we've developed a special way to handle
that problem - see if you can think of one).
Wednesday, 23 August 2000
- Created as a blank outline.
Thursday, 24 August 2000
- Slight reorganization to page design.
Tuesday, 28 November 2000
Back to Binary Search Trees.
On to No Class.