[Instructions] [Search] [Current] [News] [Syllabus] [Glance] [Links] [Handouts] [Project] [Outlines] [Labs] [Assignments] [Quizzes] [Exams] [Examples] [EIJ] [JPDS] [Tutorial] [API]

Back to Binary Search Trees. On to Project Discussion.

**Held** Wednesday, April 19, 2000

**Overview**

Today we consider another technique for implementing dictionaries: hash tables.

**Notes**

**Contents**

**Summary**

- The elusive goal: O(1) lookup and retrieval
- Using objects as numeric indices
- Hash tables
- Java's
`java.util.Hashtable`

class

- Surprisingly, if you're willing to sacrifice some space and increase
your constant, it is possible to build a dictionary with O(1)
`get`

and`put`

. - How? By using an array, and numbering your keys in such a way that
- all numbers are between 0 and array.length-1
- no two keys have the same number (or at least few have the same number).

- If there are no collisions (keys with the same number), the system is simple
- To put a value, determine the number corresponding to the key and put it in that place of the array. This is O(1+cost of computing that number).
- To get a value, determine the number corresponding to the key and look in the appropriate cell. This is O(1+cost of finding that number).

- Implementations of dictionaries using this strategy are called
*hash tables*. - The function used to convert an object to a number is the
*hash function*. - To better understand hash tables, we need to consider
- The hash functions we might develop.
- What to do about collisions.

- The goal in developing a hash function is to come up with a function
that is unlikely to map two objects to the same position.
- Now, this isn't possible (particularly if we have more objects than positions).
- We'll discuss what to do about two objects mapping to the same position later.

- Hence, we sometimes accept a situation in which the hash function distributes the objects more or less uniformly.
- It is worth some experimentation to come up with such a function.
- In addition, we should consider the cost of computing the hash function. We'd like something that is relatively low cost (not just constant time, but not too many steps within that constant).
- We'd also like a function that does (or can) give us a relatively
large range of numbers, so that we can get fewer collisions by increasing
the size of the hash table.
- We might want to make the size of the table a parameter to the hash function.
- We might strive for a hash function that uses the range of positive integers, and mod it by the size of the table.

- What are some hash functions you might use for strings?
- Sum the ASCII values in the string
- N*first letter + M*second letter
- ...

- Let's try an exercise. We'll come up with a hash value for everybody's first name. We'll then put things in the hash table.
- We'll use ``sum the values of the letters in the name''.
- We'll use the following table:
A: 1 F: 6 K: 11 P: 16 U: 21 Z: 26 B: 2 G: 7 L: 12 Q: 17 V: 22 C: 3 H: 8 M: 13 R: 18 W: 23 D: 4 I: 9 N: 14 S: 19 X: 24 E: 5 J: 10 O: 15 T: 20 Y: 25

- There are 24 of you in the class. Typically, our hash tables are somewhat
bigger than the size of the collection we're working with, so we'll use a
hash table of size 50.
- Once you've computed your result, mod it by 50.

- For my name (Samuel), the hash value is
19 (S) + 1 (A) + 13 (M) + 21 (U) + 5 (E) + 12 (L) =
**71**. Mod that by 50 to get 21. - For one son's name (William), the hash value is
23 (W) + 9 (I) + 12 (L) + 12 (L) + 9 (I) + 1 (A) + 13 (M) =
**79**. Mod that by 50 to get 29. - For the other son's name (Jonathan), the hash value is
10 (J) + 15 (O) + 14 (N) + 1 (A) + 20 (T) + 8 (H) + 1 (A) + 14 (N) =
**83**. Mod that by 50 to get 33. - What if the hash table has only twelve cells (which would be large if I were storing only three objects)? Then both Jonathan and I hash to cell 11.

- Hash tables are so useful that Java includes them as a standard
library class,
`java.util.Hashtable`

. - Let's look over the documentation
- Why are there three constructors?
- What methods are there other than
`get`

and`put`

? - Where's the hash function?

- Our analysis of Hash Tables to date has been based on two simple operations: get and put.
- What happens if we want to remove elements? This can significantly complicate matters.
- If we've chosen the ``shift into a blank space'' technique for
resolving collisions, what do we
do when it comes time to remove elements?
- Do we shift everything back? If so, think about how far we may have to look.
- Do we leave the thing there as a blank? We might then then remove it later when it's convenient to do so.
- Do we do something totally different?

- Note also that there are different ways of specifying ``remove''. We might remove the element with a particular key. We might instead remove elements based on their value. The second is obviously a much slower operation than the first (unless we've developed a special way to handle that problem - see if you can think of one).

Tuesday, 18 January 2000

- Created as a blank outline.

Wednesday, 19 April 2000

- Filled in some details. Most were taken from outline 42 of CSC152 99F.

Back to Binary Search Trees. On to Project Discussion.

[Instructions] [Search] [Current] [News] [Syllabus] [Glance] [Links] [Handouts] [Project] [Outlines] [Labs] [Assignments] [Quizzes] [Exams] [Examples] [EIJ] [JPDS] [Tutorial] [API]

**Disclaimer** Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

This page may be found at http://www.math.grin.edu/~rebelsky/Courses/CS152/2000S/Outlines/outline.43.html

Source text last modified Fri Apr 21 10:37:39 2000.

This page generated on Fri Apr 21 10:44:13 2000 by Siteweaver. Validate this page's HTML.

Contact our webmaster at rebelsky@grinnell.edu