Skip to main content

Lab: Hash tables

Held
Wednesday, 27 February 2019
Writeup due
Friday, 1 March 2019
Summary
In this laboratory, we explore the use of hash tables to store information.

Important syntax and procedures

'#hash(("Key1" . "Value1") ("Key2" . "Value2") ...) — How Racket prints hash tables. Also how you create a new immutable hash table.

(make-hash) — create a new mutable hash table.

(hash-ref hash key) — Look up a value in a hash table.

(hash-set! hash key value) — Change the value associated with a key in a hash table.

(hash-remove! hash key) — Remove a key/value pair from a hash table.

(hash-has-key? hash key) — Determine if a key appears in a hash table.

Additional procedures introduced in this reading

(hash-set immutable-hash key value) — Like hash-set!, but for immutable hash tables.

(hash-remove immutable-hash key) — Like hash-remove, but for immutable hash tables.

(hash-keys hash) — Get a list of keys from the hash table.

(for-each proc! lst) — Somewhat like map; apply proc! to each element of the list, throwing away the result of proc!.

Preparation

a. Start DrRacket.

b. Make sure that you have the latest version of the loudhum package by opening a terminal window and typing /home/rebelsky/bin/csc151/update. (Alternately, select File > Install Package…, enter “https://github.com/grinnell-cs/loudhum.git” and follow the instructions.)

c. Don’t forget to add (require loudhum) to the definitions pane.

d. If we did not review the self checks at the start of class, review the self checks with your partner.

Exercises

Exercise 1: A basic table

In the reading, we created a simple hash table of book authors. Here are the commands we used.

> (define book-authors (make-hash))
> (hash-set! book-authors "The Princess Bride" "William Goldman")
> (hash-set! book-authors "Homegoing" "Yaa Gasi")
> (hash-set! book-authors "Moo" "Jane Smiley")
> (hash-set! book-authors "Moo, Baa, La La La!" "Sandra Boynton")

a. Transfer those commands to the interactions pane. You should eliminate the prompts.

b. Add a few more book/author pairs.

c. In the interactions pane, confirm that you can get the author of “Moo” and “Homegoing”.

d. What do you expect as the result of the following expression?

> (hash-ref book-authors "homegoing")

e. Check your answer experimentally.

f. In the reading, we claimed that you can use hash-set! to change the author associated with a title. Verify that claim.

g. Although we did not mention it in the reading, there is also a procedure, (hash-remove! hash key) that removes a value from a hash table. Determine by experiment how this procedure works.

h. What do you think happens if you try to remove a key from a hash table that is not in the hash table?

i. Check your answer experimentally.

Exercise 2: An immutable hash table

Recall that you define immutable hash tables with a command like the following. (Note the period in between the key and value.)

> (define sidekicks 
    '#hash(("Peabody" . "Sherman")
           ...))

a. Define an immutable hash table, sidekicks, that associates the names of some famous cartoon protagonists (as strings) with the names of their sidekicks (again, as strings).

Protagonist Sidekick
Peabody Sherman
Yogi Booboo
Secret Squirrel Morocco Mole
Tennessee Tuxedo Chumley
Quick Draw McGraw Baba Looey
Dick Dastardly Muttley
Bullwinkle Rocky
Bart Simpson Milhouse Van Houten
Asterix Obelix
Strong Bad The Cheat

b. Verify that you can look up a few characters in the table, such as “Asterix” or “Yogi”.

c. Determine what happens if you try to look something up by sidekick (e.g., “Sherman”) rather than protagonist.

d. We claimed that it is not possible to change an immutable hash table. Verify that claim by trying to set a value in the table (e.g., to make “Homestar Runner” a sidekick of “Strong Bad”) and by trying to remove an entry.

e. We claimed that the order of the key/value pairs in the table might change from the order in which we created the table. Check that claim by looking at the contents of sidekicks.

Exercise 3: More experiments with immutable hash tables

a. What do you expect to happen if we try to put two values with identical keys in an immutable hash table?

(define more-sidekicks
  '#hash(("Scooby Doo" . "Shaggy")
         ("Scooby Doo" . "Scrappy Doo")))

b. Check your answer experimentally.

c. Although we cannot use hash-set! and hash-remove! with immutable hash tables, there are related procedures, called hash-set and hash-remove, that we can use. For example,

> (hash-set sidekicks "Strong Bad" "Homestar Runner")
?
> (hash-remove sidekicks "Strong Bad")
?
> sidekicks
?

What do you expect these two procedures to do? What do you expect the value of sidekicks to be when we’re done?

d. Check your answer experimentally.

e. What do you expect as the final result of the second of the following two expressions? (You should assume that both expressions are evaluated and that sidekicks is defined as in the previous exercise.)

> (hash-set sidekicks "Scooby Doo" "Shaggy")
> (hash-ref sidekicks "Scooby Doo")
???

f. Check your answer experimentally.

g. What do you expect as the result of the following expression?

> (hash-ref (hash-set sidekicks "Scooby Doo" "Shaggy")
            "Scooby Doo")
?

h. Check your answer experimentally.

i. What do you expect as the final result of the second of the following two expressions?

> (hash-remove sidekicks "Strong Bad")
> (hash-ref sidekicks "Strong Bad")

j. Check your answer experimentally.

k. What do you expect as the result of the following expression?

> (hash-ref (hash-remove sidekicks "Strong Bad"))

l. Check your answer experimentally.

Exercise 4: Returning to mutable hash tables

Let’s try repeating some of this experiments with a mutable hash table.

a. Create a mutable hash table, sidekick-protagonists that associates sidekicks with their protagonists (rather than vice versa). Here are a few lines to get you started.

(define sidekick-protagonists (make-hash))
(hash-set! sidekick-protagonists "Sherman" "Peabody")
(hash-set! sidekick-protagonists "Booboo" "Yogi")

b. What do you expect as the final result of the second of the following two expressions? (You should assume that both expressions are evaluated and that sidekicks is defined as in the previous exercise.)

> (hash-set! sidekick-protagonists "Shaggy" "Scooby Doo")
> (hash-ref sidekick-protagonists "Shaggy")
???

c. Check your answer experimentally.

d. What do you expect as the result of the following expression?

> (hash-ref (hash-set! sidekick-protagonists "Shaggy" "Scooby Doo")
            "Shaggy")
?

e. Check your answer experimentally.

f. What do you expect as the final result of the second of the following two expressions?

> (hash-remove! sidekick-protagonists "The Cheat")
> (hash-ref sidekick-protagonists "The Cheat")

g. Check your answer experimentally.

h. What do you expect as the result of the following expression?

> (hash-ref (hash-remove! sidekick-protagonists "The Cheat"))

i. Check your answer experimentally.

Exercise 5: Mutable vs. immutable hash tables

You’ve now experimented a bit with both mutable and immutable hash tables.
Spend a few minutes discussing with your partner what you see as the relative benefits of mutable and immutable hash tables. Be prepared to share your answers with the class. (You should also be prepared to submit your answers as a lab writeup.)

Exercise 6: Counting words

You’ve seen that the tally-value lets us count the number of times a particular value appears in a list. What if we want to count each different word in the list? We could go through the list once, tallying each word separately. But that seems inefficient.

Here’s a better solution: We can use a hash table to keep track of the count of each word.

Let’s assume that word-counts is a hash table whose keys are strings and whose values are numbers.

a. Write a procedure (add-word! word-counts word) with the following behavior.

  • If word appears in word-counts, grab the count associated with word, add 1, and store the new value back in word-counts.
  • If word does not appear in word-counts, create a new entry in word counts whose key is word and whose value is 1.

For example,

> (define word-counts (make-hash))
> word-counts
'#hash()
> (add-word! word-counts "example")
> (add-word! word-counts "snow")
> word-counts
'#hash(("example" . 1) ("snow" . 1))
> (add-word! word-counts "example")
> word-counts
'#hash(("example" . 2) ("snow" . 1))

b. What if we want to add lots of words. That seems to be a task for map, doesn’t it? Give it a try. Use map to add the words in the following list to word-counts.

(list "cat" "and" "hat" "and" "rat")

c. Here’s what we tried.

> (map (section add-word! word-counts <>) (list "cat" "and" "hat" "and" "rat"))
'(#<void> #<void> #<void> #<void> #<void>)
> word-counts
'#hash(("and" . 2) ("cat" . 1) ("example" . 2) ("hat" . 1) ("rat" . 1) ("snow" . 1))

It seems to have worked, but we’ve also ended up with a list of these strange #<void> values. Why? Because add-word!, like hash-set!, returns nothing. For situations like this, in which our primary goal is to change an underlying structure, Racket provides a procedure called map.

Determine experimentally what happens when we use for-each rather than map.

d. It seems worthwhile to work with a list that is slightly longer than our five-word list but shorter than the much longer lists that we get from a full novel. In your definitions pane, add a definition for eyre-a-words, which gives all of the words in Jane Eyre that start with the lowercase letter “w”. (Jane Eyre can be found at /home/rebelsky/Desktop/pg1260.txt.) In case you’ve forgotten, filter will help you achieve this goal.

e. Reset word-counts to an empty hash table and then use for-each and add-word! to add all the words in eyre-w-words. That hash table should then be short enough to view in the interactions pane. Can you easily tell which is the most frequent word?

Exercise 7: Strange code

a. What does the following procedure do? For example, what do you expect for (seven-eh word-counts "window")?

(define seven-eh
  (lambda (hash str)
    (list str (hash-ref hash str))))

b. The procedure hash-keys takes a hash table as input and returns a list of all of the keys in that hash table. Use hash-keys and length to determine how many entries there are in word-counts.

c. What do you expect the following expression to produce?

> (map (section seven-eh word-counts <>) 

d. Check your answer experimentally.

e. What does the following procedure do?

(define seven-ee?
  (lambda (entry1 entry2)
    (>= (list-ref entry1 1) (list-ref entry2 1))))

f. What do you expect as the result of the following?

> (sort (map (section seven-eh word-counts <>)) seven-ee?)

For those with extra time

If you find that you have extra time, you might consider one or more of the following exercises.

Extra 1: Counting words, revisited

Write a procedure, (count-words fname), that returns a hash table of the word frequencies in the file named by fname. As you’ve likely figured out from the previous exercises, your procedure should probably,

  • Create a new hash table (most likely, with a let).
  • Read all of the words in the file (most likely, with file->words).
  • Use the hash table to count the words in the file (using for-each and add-word!).
  • Return the hash table.

Extra 2: Most common words

Write a procedure (most-common-words fname n), that reads all of the words in the file and returns the n most common words in the file. You will likely want to create a hash table for all of the word counts (see extra 1), turn that into a list of word/count lists, sort that list, and take the first n elements.

Acknowledgements

This lab was (mostly) newly written in spring 2019.

The cartoon sidekicks example was drawn from a lab written by Benjamin Gum in the early 2000’s. Samuel A. Rebelsky likely added Asterix and Obelix and almost certainly added Strong Bad and his cohort.

Jane Eyre is by Charlotte Bronte.