Association lists
Summary: We consider association lists, a common technique for
storing information, as well as the assoc procedure, which is used
for searching through those lists.
Introduction
Consider the organization of a simple email directory for the campus: a sequence of entries, each consisting of a name and an email address. In Scheme, it will be useful to represent both the name and the email address as strings.
One approach would be to write a procedure that uses a very long
cond expression to give the email addresses.
(define lookup-email
(lambda (name)
(cond
[(equal? name "Samuel A. Rebelsky")
"messy-office@grinnell.edu"]
[(equal? name "Peter-Michael Osera")
"type-theory-snob@grinnell.edu"]
[(equal? name "Anya Vostinar")
"cache@grinnell.edu"]
[(equal? name "Sarah Dahlby Albright")
"cheery-coach@grinnell.edu"]
[else
(error "invalid name" name)])))
However, that could take a long time to write. As importantly, such a
solution is not very adaptable. If we change the notion of equality
(e.g., to string-ci=? or to a more sophisticated approximate matching
strategy), we have a large number of entries to update.
Another approach is to represent the information more systematically, in a form that lets us see what kind of information we have without any associated code. As long as we’re making a list, we can add other information. Each entry will have a last name, a first name, an email address, a phone number, and any additional characteristics we think are useful.
Last First Username Phone More
---- ----- -------- ----- ----
Rebelsky Samuel messy-office 4410 CSC,151prof,DigHum,Professor
Klinge Titus cauldron 3271 CSC,Assistant,151prof
Weinman Jerod map-reader 9812 CSC,Chair,Associate,NSF Grant
Osera PM type-snob 4010 Assistant,NSF Grant,Tutorial,CSC
Curtsinger Charlie systems-guy 3127 CSC,Assistant,Tutorial,ASFAC
Vostinar Anya cache 4306 CSC,Assistant,Interdisciplinary
Dahlby-Albright Sarah cheery-coach 4362 Statistics,CSC,Outreach
Rodrigues Liz vivero 3362 LIB,DigHum,Assistant
Barks Sarah stem-careers 4940 CLS,STEM,Neuroscience
Kington Raynard the-prez 3000 Bigwig
When computer scientists represent information in this way, they tend to refer to the information as a table. Often, we call a group of tables a database. Once information is grouped into a table, we can write procedures to select information, such as all the CSC folks.
Representing database tables
A great deal of research has gone into ways to represent database tables efficiently, so that the queries we make to select elements or create new tables are quick. These structures are beyond the scope of this course. For now, we will choose a simple and understandable representation that builds upon the core Scheme structures.
In particular, to represent each of these individual entries, we can use
a list, such as '("Klinge" "Titus" "cauldron" "3271"). In each
case, the last name is the car of the entry, the first name is the cadr
of the entry, the email address is the caddr, and so on and so forth.
Our entire directory, therefore, would be a list of such entries.
;;; Value:
;;; grinnell-directory
;;; Type:
;;; List of lists.
;;; Note:
;;; * Each sublist is of length at least four and contains a last
;;; name, a first name, a username, a phone number, and an
;;; optional sequence of attributes.
;;; * The
;;; Contents:
;;; A list of people at Grinnell with contact information and some
;;; useful attributes.
(define grinnell-directory
(list
(list "Rebelsky" "Samuel" "messy-office" "4410")
(list "Klinge" "Titus" "cauldron" "3271")
(list "Weinman" "Jerod" "map-reader" "9812")
(list "Osera" "PM" "type-snob" "4010")
(list "Curtsinger" "Charlie" "systems-guy" "3127")
(list "Vostinar" "Anya" "cache" "4306")
(list "Dahlby-Albright" "Sarah" "cheery-coach" "4362")
(list "Rodrigues" "Liz" "vivero" "3362")
(list "Barks" "Sarah" "stem-careers" "4940")
(list "Kington" "Raynard" "the-prez" "3000")))
Note that we are leaving out the other attributes temporarily for clarity.
While this is a simple representation, it is frequently used by Scheme programmers who want to describe tables. There’s even a name for this arrangement: In Scheme, a list of pairs or lists in which we use the value of the car to identify an entry is called an association list or alist.
As the email directory example suggests, a common application of
association lists involves looking for a desired name or first component
of an entry and then returning another part of the entry (or something
computed from other parts of the entry). For example, we are likely to
search the directory for a last name then build an email address by
concatenating the username with "@grinnell.edu".
Because the first entry has a special role, we often refer to it as the
key of the entry. We refer to the rest of the entry as the associated
value. In the directory above, the last names are the keys (e.g.,
"Rebelsky", "Klinge", and "Kington") and the user names and phone
numbers are the associated values.
assoc, Scheme’s built-in lookup procedure
Since applications in which we need to extract data from tables are
very common, Scheme provides procedures to retrieve from an association
list the entry containing a specified key. The most frequently used
procedure of this kind is assoc. Given a key and association list,
assoc returns the first entry with the given key. If the key does not
occur in the association list, then assoc returns #f. For example,
> (assoc "Barks" grinnell-directory)
'("Barks" "Sarah" "stem-careers" "4940")
> (assoc "Reisch" grinnell-directory)
#f
Extracting information
Once we’ve used assoc to find an entry, what do we do next? Most
typically, we then do something with the rest of the entry. To continue
our color table example, we might want to convert the username to an email
address.
So, what do we do? We use assoc to look up an entry with the given
last name. If we find the entry, we then extract the username and turn
it into an email addres. If we don’t find the entry, we return some
default value. In code, we might write something like the following.
;;; Procedure:
;;; lookup-email-by-last-name
;;; Parameters:
;;; last-name, a string
;;; directory, a list of directory entries
;;; Purpose:
;;; Look up an email address in the table.
;;; Produces:
;;; email-address, a string
;;; Preconditions:
;;; * Each entry in directory must be a list.
;;; * Element 0 of each entry must be a string which represents the
;;; last name.
;;; * Element 2 of each entry must be a string which represents the
;;; user name.
;;; * Email addresses have the form "username@grinnell.edu".
;;; Postconditions:
;;; * If an entry for the last name appears once in the table,
;;; email-address represents their email address.
;;; * If multiple entries with a matching last name appear in the table,
;;; email-address represents the email address for one of them.
;;; * If no entries with a matching last name appear in the table,
;;; email-address is the empty string, ""
;;; * Does not affect the table.
(define lookup-email-by-last-name
(lambda (last-name directory)
(if (assoc last-name directory)
(string-append (list-ref (assoc last-name directory) 2)
"@grinnell.edu")
"")))
For example,
> (lookup-email-by-last-name "Barks" grinnell-directory)
"stem-careers@grinnell.edu"
> (lookup-email-by-last-name "Reisch" grinnell-directory)
""
> (equal? "" (lookup-email-by-last-name "Barks" grinnell-directory))
#f
> (equal? "" (lookup-email-by-last-name "Reisch" grinnell-directory))
#t
Note that the result depends on the directory. For example,
> (lookup-email-by-last-name "Rebelsky" grinnell-directory)
"messy-office@grinnell.edu"
> (lookup-email-by-last-name "Rebelsky" null)
""
> (lookup-email-by-last-name "Rebelsky"
(list (list "Rebelsky" "SA" "lucifer" "0666")))
"lucifer@grinnell.edu"
Some of you may recall asking why if might take a value other than
#t or #f as a parameter. This procedure is one example of why.
Looking up an email address, revisited
One problem with the lookup-email-by-last-name procedure given above
is that it calls assoc two times with identical parameters, once to
determine whether the last name is in the list and once to extract the
user name. Rather than calling assoc twice,
we might call it once, name that result, and use it each time.
(define lookup-email-by-last-name
(lambda (last-name directory)
(let ([assoc-result (assoc last-name directory)])
(if assoc-result
(string-append (list-ref assoc-result 2)
"@grinnell.edu")
""))))
Building more complex entries
If you recall our initial plans for the directory, we wanted to store more than just a name, user name, and phone number. For example, it would be useful to store descriptive information, which we might represent as strings.
(define grinnell-directory-annotated
(list
(list "Rebelsky" "Samuel" "messy-office" "4410" "CSC" "151prof" "DigHum" "Professor")
(list "Klinge" "Titus" "cauldron" "3271" "CSC" "Assistant" "151prof")
(list "Weinman" "Jerod" "map-reader" "9812" "CSC" "Chair" "Associate" "NSF Grant")
(list "Osera" "PM" "type-snob" "4010" "Assistant" "NSF Grant" "Tutorial" "CSC")
(list "Curtsinger" "Charlie" "systems-guy" "3127" "CSC" "Assistant" "Tutorial" "ASFAC")
(list "Vostinar" "Anya" "cache" "4306" "CSC" "Assistant" "Interdisciplinary")
(list "Dahlby-Albright" "Sarah" "cheery-coach" "4362" "Statistics" "CSC" "Outreach")
(list "Rodrigues" "Liz" "vivero" "3362" "LIB" "DigHum" "Assistant")
(list "Barks" "Sarah" "stem-careers" "4940" "CLS" "STEM" "Neuroscience")
(list "Kington" "Raynard" "the-prez" "3000" "Bigwig")))
You should note a few things about this revised list. First, we’ve left
the first name, user name, and phone number in the same place. That means
that lookup-email-by-last-name and any other procedures we’ve written
already will continue to work.
> (lookup-email-by-last-name "Barks" grinnell-directory)
"stem-careers@grinnell.edu"
> (lookup-email-by-last-name "Barks" grinnell-directory-annotated)
"stem-careers@grinnell.edu"
> (assoc "Barks" grinnell-directory)
'("Barks" "Sarah" "stem-careers" "4940")
> (assoc "Barks" grinnell-directory-annotated)
'("Barks" "Sarah" "stem-careers" "4940" "CLS" "STEM" "Neuroscience")
Second, we’ve made the association list easier to read by lining up corresponding values, as in a table. We’re taking advantage of the fact that Scheme ignores extra spaces.
Implementing assoc
As is the case with many of the built-in Scheme procedures, assoc is
relatively easy to implement ourselves. In essence, assoc is much like
other algorithms we’ve written to look for a value in a list. That is,
assoc recursively steps through the list until it finds a match or
runs out of elements.
;;; Procedure:
;;; assoc
;;; Parameters:
;;; key, a Scheme value
;;; alist, an association list
;;; Purpose:
;;; Find an entry with key key in alist.
;;; Produces:
;;; entry, a Scheme value
;;; Preconditions:
;;; No additional
;;; Postconditions:
;;; If there is an index, i, such that
;;; (equal? key (car (list-ref alist i)))
;;; then entry is the first such entry
;;; Otherwise, entry is false (#f)
(define assoc
(lambda (key alist)
(cond
; If there are no entries left in the association list,
; there are no entries with the given key.
[(null? alist)
#f]
; If the key we're looking for is the key of the first
; entry, then use that entry.
[(equal? key (car (car alist)))
(car alist)]
; Otherwise, look in the rest of the association list.
[else
(assoc key (cdr alist))])))
“Improving” assoc
As you may have noted, there’s a difficulty with assoc in the context
we are using it. In particular, it only does case-sensitive matching.
> (assoc "klinge" grinnell-directory-annotated)
#f
> (assoc "Klinge" grinnell-directory-annotated)
'("Klinge" "Titus" "cauldron" "3271" "CSC" "Assistant" "151prof")
Since we’re writing our own version of assoc, we can fix that problem.
However, it may mean that we need to restrict our keys to strings.
;;; Procedure:
;;; assoc-ci
;;; Parameters:
;;; key, a string
;;; alist, an association list
;;; Purpose:
;;; Find an entry with a matching key in alist.
;;; Produces:
;;; entry, a Scheme value
;;; Preconditions:
;;; The car of each entry in alist is a string
;;; Postconditions:
;;; * If there is an index, i, such that
;;; (string-ci=? key (car (list-ref alist i)))
;;; then entry is the first such entry
;;; * Otherwise, entry is false (#f)
(define assoc-ci
(lambda (key alist)
(cond
; If there are no entries left in the association list,
; there are no entries with the given key.
[(null? alist)
#f]
; If the key we're looking for is the key of the first
; entry, then use that entry.
[(string-ci=? key (car (car alist)))
(car alist)]
; Otherwise, look in the rest of the association list.
[else
(assoc-ci key (cdr alist))])))
Let’s see if that works.
> (assoc-ci "klinge" grinnell-directory-annotated)
'("Klinge" "Titus" "cauldron" "3271" "CSC" "Assistant" "151prof")
> (assoc-ci "Klinge" grinnell-directory-annotated)
'("Klinge" "Titus" "cauldron" "3271" "CSC" "Assistant" "151prof")
> (assoc-ci "Kringle" grinnell-directory-annotated)
#f
> (assoc-ci "klINge" grinnell-directory-annotated)
'("Klinge" "Titus" "cauldron" "3271" "CSC" "Assistant" "151prof")
It does. But what happens if we use something other than a string as our search key?
> (assoc 42 grinnell-directory-annotated)
#f
> (assoc-ci 42 grinnell-directory-annotated)
. . string-ci=?: contract violation
expected: string?
given: 42
argument position: 1st
other arguments...:
If we insist on handling multiple situations (strings, which we compare
with string-ci=?, and everything else, which we compare with
equal?), we’ll need a slightly more complex approach. In essence,
we need to define a new procedure for comparing two elements that takes
their type into account.
(define assoc-new
(let ([match? (lambda (val1 val2)
(or (and (string? val1)
(string? val2)
(string-ci=? val1 val2))
(equal? val1 val2)))])
(lambda (key alist)
(cond
; If there are no entries left in the association list,
; there are no entries with the given key.
[(null? alist)
#f]
; If the key we're looking for matches the key of the first
; entry, then use that entry.
[(match? key (car (car alist)))
(car alist)]
; Otherwise, look in the rest of the association list.
[else
(assoc-new key (cdr alist))]))))
> (assoc-new "curtSINGER" grinnell-directory-annotated)
'("Curtsinger" "Charlie" "systems-guy" "3127" "CSC" "Assistant" "Tutorial" "ASFAC")
> (assoc-new 42 grinnell-directory-annotated)
#f
> (assoc-new 42 (list (list 42 "the answer")
(list "Iverson" "Allen" "The Answer")))
'(42 "the answer")
> (assoc-new "Iverson" (list (list 42 "the answer")
(list "Iverson" "Allen" "The Answer")))
'("Iverson" "Allen" "The Answer")
> (assoc-new null (list (list 42 "the answer")
(list "Iverson" "Allen" "The Answer")))
#f
Using other keys
The assoc procedure works fine if the key is the first element of a
data item. But what if it’s the second (or third, or fourth, or …). For
example, what if we want to find the last name of the person whose first
name is “PM” or whose email address is "map-reader"? We can’t rely on
assoc, because it only looks at the first element of each list. Instead,
we need to write our own procedure, using the same structure as above.
For example, to find someone by phone number, we might write a procedure like the following.
;;; Procedure:
;;; lookup-person-by-phone
;;; Parameters:
;;; phone, a string
;;; directory, a list of directory entries
;;; Purpose:
;;; Find the name of the person who has a particular phone number.
;;; Produces:
;;; name, a string (or the value #f)
;;; Preconditions:
;;; * phone is a string with four digits.
;;; * Each entry in directory must be a list.
;;; * Element 0 of each entry must be a string which represents the
;;; last name.
;;; * Element 1 of each entry must be a string which represents the
;;; first name.
;;; * Element 3 of each entry must be a string which represents the
;;; four-digit phone number.
;;; * People have both first names and last names.
;;; Postconditions:
;;; * If an entry for the phone number appears once in the table,
;;; name represents their name.
;;; * If multiple entries with a matching phone number appear in the
;;; table, name represents the name for one of them.
;;; * If no entries with a matching phone number appear in the table,
;;; name is #f.
;;; * Does not affect the table.
(define lookup-person-by-phone
(lambda (phone directory)
(cond
[(null? directory)
#f]
[(equal? phone (list-ref (car directory) 3))
(string-append (cadar directory) " " (caar directory))]
[else
(lookup-person-by-phone phone (cdr directory))])))
> (lookup-person-by-phone "1234" grinnell-directory)
#f
> (lookup-person-by-phone "3000" grinnell-directory)
"Raynard Kington"
It might even make sense to generalize this procedure, so that we can look up by any of the four key pieces of information.
;;; Procedure:
;;; lookup-name-by
;;; Parameters:
;;; key, a string
;;; position, an integer between 0 and 3, inclusive
;;; directory, a list of directory entries
;;; Purpose:
;;; Find the full name of a person who has a matching string at
;;; the designated position (0 for last name, 1 for first name,
;;; 2 for username, and 3 for four-digit phone number).
;;; Produces:
;;; name, a string (or the value #f)
;;; Preconditions:
;;; * phone is a string with four digits.
;;; * Each entry in directory must be a list.
;;; * Element 0 of each entry must be a string which represents the
;;; last name.
;;; * Element 1 of each entry must be a string which represents the
;;; first name.
;;; * Element 2 of each entry must be a string which represents the
;;; user name.
;;; * Element 3 of each entry must be a string which represents the
;;; four-digit phone number.
;;; * People have both first names and last names.
;;; Postconditions:
;;; If position is 0 and an entry with the same last name appears
;;; somewhere in the table, name is the name associated with
;;; one such entry.
;;; If position is 1 and an entry with the same first name appears
;;; somewhere in the table, name is the name associated with
;;; one such entry.
;;; If position is 2 and an entry with the same user name appears
;;; somewhere in the table, name is the name associated with
;;; one such entry.
;;; If position is 3 and an entry with the same four-digit phone
;;; appears somewhere in the table, name is the name associated
;;; with one such entry.
;;; If no matching entries appear, cname is #f.
;;; Does not affect the table.
(define lookup-name-by
(lambda (key position directory)
(cond
[(null? directory)
#f]
[(string-ci=? key (list-ref (car directory) position))
(string-append (cadar directory) " " (caar directory))]
[else
(lookup-name-by key position (cdr directory))])))
> (lookup-name-by "Samuel" 0 grinnell-directory)
#f
> (lookup-name-by "Samuel" 1 grinnell-directory)
"Samuel Rebelsky"
> (lookup-name-by "3000" 3 grinnell-directory)
"Raynard Kington"
> (lookup-name-by "cheery-coach" 2 grinnell-directory)
"Sarah Dahlby-Albright"
If we want more specific lookup procedures, we can define them in terms
of lookup-name-by.
(define lookup-name-by-username
(lambda (uname directory)
(lookup-name-by uname 2 directory)))
(define lookup-name-by-first-name
(section lookup-name-by <> 1 <>))
> (lookup-name-by-username "map-reader" grinnell-directory-annotated)
"Jerod Weinman"
> (lookup-name-by-username "nobody" grinnell-directory-annotated)
#f
> (lookup-name-by-first-name "nobody" grinnell-directory-annotated)
#f
> (lookup-name-by-first-name "Anya" grinnell-directory-annotated)
"Anya Vostinar"
What if we want to return all the values with a particular key, or permit entries where the component is “close enough” to the desired key? We’ll explore those issues in lab.
Self checks
Check 1: Defining assoc
a. What is the definition of an association list?
b. Review the sample implementation of assoc above.
c. All of the example association lists are lists of lists. What do you
think that assoc will do if it is given a list in which each element
is a pair, rather than a list? For example, can we use assoc to search
the following list to determine the first name of a faculty member whose
last name you know?
(define math-cs-stats
(list (cons "Walker" "Henry")
(cons "Stone" "John")
(cons "Osera" "Peter-Michael")
(cons "Vostinar" "Anya")
(cons "Weinman" "Jerod")
(cons "Wolf" "Royce")
(cons "Chamberland" "Marc")
(cons "Shuman" "Karen")
(cons "French" "Chris")
(cons "Mileti" "Joseph")
(cons "Paulhus" "Jennifer")
(cons "Klinge" "Titus")
(cons "Kuiper" "Shonda")
(cons "Blanchard" "Jeffrey")
(cons "Jonkman" "Jeffrey")))
d. Confirm or refute your answers by experimentation.
Check 2: Making your own lists
a. Define an association list, sidekicks, that associates the names of
some famous cartoon protagonists (as strings) with the names of their
sidekicks (again, as strings).
Here’s a table containing information for your association list:
| Protagonist | Sidekick |
|---|---|
| Peabody | Sherman |
| Yogi | Booboo |
| Secret Squirrel | Morocco Mole |
| Tennessee Tuxedo | Chumley |
| Quick Draw McGraw | Baba Looey |
| Dick Dastardly | Muttley |
| Bullwinkle | Rocky |
| Bart Simpson | Milhouse Van Houten |
| Asterix | Obelix |
| Strong Bad | The Cheat |
b. Use the assoc procedure to search the sidekicks association list
for someone who is on the list and for someone who is not on the list.
Citations
If we remember correctly, the table of cartoon characters is due to Ben Gum.