Heterogeneous lists
Summary: We delve more deeply into Scheme’s list data type. We consider, in particular, how and why to make lists that contain different kinds of values.
Introduction: Representing data
As we’ve said, computer scientists study both the algorithms we write to manipulate information and the ways in which we represent information. We’ve looked at one way of representing collections so far, lists. As we’ve explored lists, we’ve focused on lists that contain all the same kinds of values, mostly lists of numbers. We call such lists “homogeneous lists”. But lists can contain mixtures of kinds of values. We call lists with mixtures of kinds of values “heterogeneous lists”.
An Example: UFO sightings
Here’s one example of a heterogeneous list. Consider the list of UFO sightings available at http://www.ufocasebook.com/casefiles.html. For each sighting we have a year, a name, a date (or date-like description), a location (mostly a country), a Yes/No for effect, media, contact, and abduction. (You can look at the page for what each of those mean.) We might therefore represent each entry as an eight element list.
- Element 0 is the year, which we will represent as an integer.
- Element 1 is the name, which we will represent as a string.
- Element 2 is the date, which we will represent as a string. (We generally prefer year, month, day for dates. But those are not available in this case.)
- Element 3 is the location, which we will represent as a string.
- Element 4 indicates whether or not there was a physical effect;
we will represent it as a Boolean value (
#tor#f). - Element 5 indicates whether or not there are associated media; once again, we will represent it as a Boolean value.
- Element 6 indicates that contact was made; yet again, we will use a Boolean value.
- Element 7 indicates that someone was abducted; as you might guess, we’ll use a Boolean value.
For example, here’s what we might see for one entry.
'(1969 "The Russian Crash - Sverdlovsky" "Mar, 1969" "Russia" #t #t #t #f)
Why did we start with zero, rather than one? Because lists, like strings, are “zero indexed”. The index represents the number of items that come before the element.
How do we extract the different elements from the list, either to
display them or compare them? The most straightforward is to use
list-ref, a two-parameter procedure that takes a list and an index
as inputs and returns the item at that index.
> (list-ref sverdlovsky 0)
1969
> (list-ref sverdlovsky 1)
"The Russian Crash - Sverdlovsky"
> (list-ref sverdlovsky 2)
"Mar, 1969"
> (list-ref sverdlovsky 3)
"Russia"
> (list-ref sverdlovsky 4)
#t
Scheme also provides two other operations to extract values from lists:
car extracts the first element (element zero) and cdr returns a list
containing all but the first element.
> (car sverdlovsky)
1969
> (cdr sverdlovsky)
'("The Russian Crash - Sverdlovsky" "Mar, 1969" "Russia" #t #t #t #f)
Building lists
We’ve been building new lists using list or make-list. But what
if we have an existing list and we want to add an element to the front?
Say, suppose we want to add a unique identifier to each sighting, such
as 'ufo023. Scheme provides an operation called cons that builds
a new list by adding a value to the front of the list.
> (cons 'ufo023 sverdlovsky)
'(ufo023 1969 "The Russian Crash - Sverdlovsky" "Mar, 1969" "Russia" #t #t #t #f)
> sverdlovsky
'(1969 "The Russian Crash - Sverdlovsky" "Mar, 1969" "Russia" #t #t #t #f)
> (define sample (cons 'ufo023 sverdlovsky))
> sample
'(ufo023 1969 "The Russian Crash - Sverdlovsky" "Mar, 1969" "Russia" #t #t #t #f)
Why is it called cons instead of list-prepend or something
similar? Well, that’s the name John McCarthy, the designer of Lisp, chose
about fifty years ago. “cons” is short for construct, because
cons constructs lists. (The custom of naming procedures with the basic
type they operate on, a dash, and the key operation did not start until
a few decades later.) The names car and cdr were chosen for very
specific reasons that will not make sense for a few more weeks. For now,
just accept that you’re learning a bit of computer-ese.
Nested lists
But what if you want to have more than one list, such as when we want to study all of the UFO sightings? We can make a list of lists. We’ll consider that topic in a related reading.
Building lists, revisited
You’ve now seen a variety of ways to build lists. You can use the list
procedure. You can use the make-list procedure. You can use cons
to prepend a value to a list. Suppose you prefer to build lists with
cons. How can you get started, given that cons expects a list as
one of its parameters? You start with the empty list.
Scheme’s name for the empty list is a pair of parentheses with nothing
between them: (). Most implementations of Scheme permit you to refer to
that list as nil or null. You can also create it with (list). All
permit you to describe the empty list by putting a single quote before
the pair of parentheses.
> '()
'()
> nil
'()
> null
'()
> (list)
'()
You will find that we prefer to use a name for that list. If sample code does not work in your version of Scheme, try inserting the following definitions.
(define nil '())
(define null '())
Note that by using cons and nil, we can build up a list of any length by starting with the empty list and repeatedly prepending a value.
> (define singleton (cons "Russia" null))
> singleton
'("Russia")
> (define doubleton (cons "Mar, 1969" singleton))
> doubleton
'("Mar, 1969" "Russia")
> (define tripleton (cons "The Russian Crash - Sverdlovsky" doubleton))
> tripleton
'("The Russian Crash - Sverdlovsky" "Mar, 1969" "Russia")
> (cons "senior" (cons "third-year" (cons "second-year" (cons "freshling" null))))
'("senior" "third-year" "second-year" "freshling")
You may note that lists built in this way seem a bit “backwards”. That
is, the value we add last appears at the front, rather than at the
back. However, that’s simply the way cons works and, as the last
example suggests, in many cases it is a quite sensible thing to do.
List predicates
Scheme provides two basic predicates for checking whether a value is a
list. The null? predicate checks whether a value is the empty list. The
list? predicate checks whether a value is a list (empty or nonempty).
> (null? null)
#t
> (list? null)
#t
> (null? (list 1 2 3))
#f
> (list? (list 1 2 3))
#t
> (null? 5)
#f
> (list? 5)
#f
Other common list procedures
It turns out that you can build any other list procedure with just
null, cons, car, cdr, null?, and some other programming
techniques. Nonetheless, there are enough common operations that most
programmers want to do with lists that Scheme includes them as basic
operations. (That means you don’t have to define them yourself.) Here
are a few that programmers frequently use. You may have seen some
of these before.
length
The length procedure takes one parameter, which must be a list, and
computes the number of elements in the list. (An element that happens to
be itself a list nevertheless contributes 1 to the total that length
computes, regardless of how many elements it happens to contain.)
> (length null)
0
> (length (list 1 2 3))
3
> (length (list (list 1 2 3)))
1
append
The append procedure takes any number of arguments, each of which is a list, and returns a new list formed by stringing together all of the elements of the argument lists, in order, to form one long list.
> (append (list "red" "green") (list "blue" "yellow"))
'("red" "green" "blue" "yellow")
The empty list acts as the identity for append.
> (append null (list "blue" "yellow"))
'("blue" "yellow")
> (append (list "red" "green") null)
'("red" "green")
> (append null null)
'()
cadr and company: Combining car and cdr
To reduce the amount of typing necessary for the programmer, many
implementations of Scheme provide procedures that combine car and cdr
in various ways. These procedures begin with the letter “c”, end with
the letter “r” and have a sequence of “a”’s and “d”’s in
the middle to indicate the sequence of calls to car (for an “a”)
or cdr (for a “d”). For example, cadr computes the car of the
cdr of a list (the second element), cddr computes the cdr of the cdr
of a list (all but the first two elements), and caar computes the car
of the car of a list (applicable only to nested lists).
> (define rainbow (list "red" "orange" "yellow" "green" "blue" "indigo" "violet"))
> (cadr rainbow)
"orange"
> (cddr rainbow)
'("yellow" "green" "blue" "indigo" "violet")
> (caddr rainbow)
"yellow"
> (cdddr rainbow)
'("green" "blue" "indigo" "violet")
Summary of new list procedures
nullStandard list constant.- The empty list.
(cons value lst)Standard List Procedure.- Create a new list by prepending
valueto the front oflst. (cdr lst)Standard List Procedure.- Get a list the same as
lstbut without the first element. (car lst)Standard List Procedure.- Get the first element of
lst. (null? lst)Standard list predicate.- Checks if
lstis the empty list. (list-ref lst n)Standard List Procedure.- Get the
nth element oflst. Note that elements are numbered starting at 0. (length lst)Standard List Procedure.- Return the number of elements of list
lst. (append lst_0 lst_1 ... lst_n)Standard List Procedure.- Create a new list by concatenating the elements of
lst_0,lst_1, …lst_n. (caar lst)Standard List Procedure.- If
lst’s first element is a list, gets the first element of that first element, the thecarof thecaroflst. Iflstis not a list, or its first element is not a list, reports an error. (cadr lst)Standard List Procedure.- Get the second element of
lst, thecarof thecdroflst (cddr lst)Standard List Procedure.- Get all but the first two elements of
lst, thecdrof thecdroflst (caddr lst)Standard List Procedure.- Get the third element of
lst, thecarof thecdrof thecdroflst.
Self Checks
Check 1: List procedures
Predict the results of evaluating the following expressions.
(cons 2 null)
(cons 1 (cons 2 null))
(cons 5 (list 1 2))
(caddr (iota 7))
(list-ref 2 (iota 7))
(append (iota 2) (iota 2))
(list (iota 2) (iota 2))
(append (iota 2) null)
(list (iota 2) null)
(cons (iota 2) null)
You may verify your predictions using DrRacket, but be sure you understand the results.