In your initial explorations with Scheme you have investigated a variety of basic types of data, including numbers, strings, and symbols. You can work on many kinds of problems with just these types. However, when you want to address more complex problems, particularly problems from data science, you will need to work with collections of data - not just the rating of a movie from one newspaper, but the rating of that movie from many newspapers (or even the ratings of many movies from many newspapers).
In Scheme, the simplest mechanism for dealing with collections of data are lists. Lists are collections of values that you can process one-by-one or en masse. You’ve already explored some of the things we can do with lists, such as find their length and extract some elements. In this reading, we will further consider Scheme’s list data type as well as a variety of procedures to build and manipulate lists.
To date, you’ve seen a fairly wide variety of things you can do
with lists. You can build them with list
and make-list
. You
can build a list of portions of a string with string-split
and
regexp-match*
. You can extract portions with take
, drop
, and
list-ref
. You can combine lists with append
. You can make
lists of consecutive integers with range
. You can change the
order with reverse
. You can find out where a value appears with
index-of
or indexes-of
(which some of us would prefer to be
called indices-of
). If you’ve forgotten what these procedures
do, you may want to review the prior reading on
lists.
It turns out that there’s a wealth of other things we can do with lists beyond those basic operations.
map
procedureSo, what can you do with lists once you’ve created them? Build other
lists, of course. The first way we’ll build lists from lists is with the
(map proc lst)
procedure, which creates a new list by applying the unary
(one-parameter) procedure proc
to each element of the list lst
.
For example, if we want to find the lengths of a variety of strings,
we can use map
with string-length
.
> (map string-length (list "Beware" "the" "jabberwock" "my" "son"))
'(6 3 10 2 3)
And, as we’ve already seen, we can use string-split
to extract
the words from a sentence.
> (map string-length (string-split "The jaws that bite the claws that catch"))
'(3 4 4 4 3 5 4 5)
On the math side, if we want a list of the squares of the first ten
positive integers (and we’re too lazy to compute them by hand), we
can use map
to apply the sqr
procedure to each element of the
list of the first ten positive integers.
> (define sqr (lambda (x) (* x x))) ; Also already defined.
> (range 1 11)
'(1 2 3 4 5 6 7 8 9 10)
> (map sqr (range 1 11))
'(1 4 9 16 25 36 49 64 81 100)
We can also find out the square roots of those same ten numbers.
> (map sqrt (range 1 11))
'(1 1.4142135623730951 1.7320508075688772
2 2.23606797749979 2.449489742783178
2.6457513110645907 2.8284271247461903 3
We can check those results by squaring them again.
> (map sqr (map sqrt (list 1 2 3 4 5 6 7 8 9 10)))
'(1 2.0000000000000004 2.9999999999999996
4 5.000000000000001 5.999999999999999
7.000000000000001 8.000000000000002 9
10.000000000000002)
Aren’t approximations wonderful? They get even more interesting when we start rounding.
> (map ceiling (map sqr (map sqrt (list 1 2 3 4 5 6 7 8 9 10))))
'(1 3.0 3.0 4 6.0 6.0 8.0 9.0 9 11.0)
What should you take away from this? First, anything you can do to a
single value you can also do to all values in a list by using it with the
map
procedure. Second, we often want to do a sequence of operations
to each value in the list.
As we just noted, we end up writing map
a lot when we want to
sequence operations. Is there a better strategy? Yes. As you may
recall, the loudhum
library provides a procedure that allows you
to compose functions. What is composition? You may remember it
from your algebra class. If f and g are functions, the composition
of f and g, written f o g, is also a function that applies
g to its parameter and then f to the g’s result..
In traditional notation, we would write
(f o g)(x) = f(g(x))
In Scheme notation, we write
((o f g) x) = (f (g x))
The cool thing about this compose function is that it can take lots of functions. However, as in the case of the traditional compose, it does them right to left. Hence, the expression we just wrote as
> (map ceiling (map sqr (map sqrt (list 1 2 3 4 5 6 7 8 9 10))))
'(1 3.0 3.0 4 6.0 6.0 8.0 9.0 9 11.0)
we can more easily write as
> (map (o ceiling sqr sqrt) (list 1 2 3 4 5 6 7 8 9 10))
'(1 3.0 3.0 4 6.0 6.0 8.0 9.0 9 11.0)
And that makes it easier for us to make the results exact, too.
> (map (o inexact->exact ceiling sqr sqrt) (list 1 2 3 4 5 6 7 8 9 10))
'(1 3 3 4 6 6 8 9 9 11)
Although experienced programmers usually prefer composition, less experienced programmers may find it more natural to use a lambda expression.
> (map (lambda (x) (inexact->exact (ceiling sqr (sqrt) x )))
(list 1 2 3 4 5 6 7 8 9 10))
'(1 3 3 4 6 6 8 9 9 11)
We will ask you to explore both forms.
When we began our exploration of numbers, we used a variety of unary (one parameter) procedures, such as those above. But we also used some binary (two parameter) operations, such as addition or multiplication. Can we also use those with lists? It seems like we’d want to. For example, if we wanted to compute mean value in a collection of numbers, we want to add up all of the elements in the collection and then divide by the length of the collection.
We’ll start with a simple list of numbers, such as (list 4 1 6 3 2 10 5)
.
We’d like to compute 4 + 1 + 6 + 3 + 2 + 10 + 5
. The loudhum
library provides a standard procedure, reduce
, that does just that.
In particular, (reduce FUN LST)
, converts LST
to a single value by
repeatedly applying FUN
to neighboring pairs of values, replacing the
pair with the result of the function.
> (require loudhum)
> (define numbers (list 4 1 6 3 2 10 8))
> (reduce + numbers)
34
Let’s see …
4+1 is 5. 6+3 is 9. 2+10 is 12. 5+9 is 14. 12+8 is 20. 14+20 is 34.
Yup.
Of course, we could also say
4+1 is 5. 5+6 is 11. 11+3 is 14. 14+2 is 16. 16+10 is 26. And 26+8 is 34.
That’s good. If it doesn’t matter what order we do the addition, we can choose whatever order is most efficient. (If we had lots and lots of numbers to add, it might be good to have different computers to add different subsets of the numbers and then add them back together at the end.) You’ll find that the same holds true for multiplication.
> (reduce * numbers)
11520
We can, of course, use reduce
in many other ways. To find the largest
value in the list, we reduce with max
.
> (reduce max numbers)
10
> (reduce min numbers)
1
We can also use reduce, like map
, with values other than numbers.
> (reduce string-append (list "one" "two" "three" "four" "five"))
"onetwothreefourfive"
> (map number->string (range 5))
'("0" "1" "2" "3" "4")
> (reduce string-append (map number->string (range 10)))
"0123456789"
> (string->number (reduce string-append (map number->string (range 10))))
123456789
> (sqrt (string->number (reduce string-append (map number->string (range 10)))))
11111.111060555555
Since reduce
requires a binary procedure, we can’t use composition
for the function. However, we can use a lambda
expression.
> (reduce (lambda (x y) (string-append x " and " y))
(map number->string (range 10)))
"0 and 1 and 2 and 3 and 4 and 5 and 6 and 7 and 8 and 9"
We can also use section
.
> (reduce (section string-append <> " and " <>)
(map number->string (range 10)))
"0 and 1 and 2 and 3 and 4 and 5 and 6 and 7 and 8 and 9"
> (reduce (section string-append <> " and " <>)
(string-split "jubjub bird frumious bandersnatch"))
"jubjub and bird and frumious and bandersnatch"
We started this section by asking ourselves about computing the average of a list. We should know have the tools to do so.
Take a moment and think to yourself about how you would compute the
average of the list of values in numbers
.
Got it?
We were serious. Think about it.
Okay, here’s what we’d write.
> (/ (reduce + numbers) (length numbers))
4 6/7 ; or 34/7
Fairly simple, isn’t it? Computing the geometric mean is only a bit harder. (It’s okay if you don’t know what the geometric mean is; it’s a bit like the mean, except that we multiply the numbers together and then take the root of the product.)
> (expt (reduce * numbers) (/ 1 (length numbers)))
3.8037108643123165
You’ve seen some basic uses of reduce
with lists. You will certainly
discover many other applications of reduce.
Of course, we’re working with computers, which means that some
things aren’t as simple as you might expect. Here’s one potential
problem. We noted that reduce
relies on our ability to combine
neighboring pairs in any order. Are there operations in which the
order in which you combine neighboring pairs matters? Certainly.
Let’s consider subtraction, using the expression (4 - 1 - 6 - 3 -
2 - 10 - 5). Here’s one computation, in which we randomly choose
which pair of numbers to use.
4 - 1 - 6 - 3 - 2 - 10 - 8 = 4 - 1 - 3 - 2 - 10 - 8
4 - 1 - 3 - 2 - 10 - 8 = 4 - 1 - 1 - 10 - 8
4 - 1 - 1 - 10 - 8 = 4 - 0 - 10 - 8
4 - 0 - 10 - 8 = 4 - 10 - 8
4 - 10 - 8 = 4 - 2
4 - 2 = 2
But that’s probably not what most of us would expect. Let’s see what the procedure does.
> (reduce - numbers)
20
> (reduce - numbers)
6
> (reduce - numbers)
28```
Ooh, that's not very good, is it. We'd almost certainly prefer consistent
results.
We might, perhaps, take a more systematic approach, either doing the
subtraction from left to right or from right to left. We'll start by
working from left to right.
> **4 - 1** - 6 - 3 - 2 - 10 - 8 = **3* - 6 - 3 - 2 - 10 - 8
> **3 - 6** - 3 - 2 - 10 - 8 = **-3** - 3 - 2 - 10 - 8
> **-3 - 3** - 2 - 10 - 8 = **-6** - 2 - 10 - 8
> **-6 - 2** - 10 - 8 = **-8** - 10 - 8
> **-8 - 10** - 8 = **-18** - 8
> **-18 - 8** = **-26**
But let's also try working from right to left.
> 4 - 1 - 6 - 3 - 2 - **10 - 8** =
> 4 - 1 - 6 - 3 - 2 - **2**
> 4 - 1 - 6 - 3 - **2 - 2** =
> 4 - 1 - 6 - 3 - **0**
> 4 - 1 - 6 - **3 - 0** =
> 4 - 1 - 6 - **3**
> 4 - 1 - **6 - 3** =
> 4 - 1 - **3**
> 4 - **1 - 3** =
> 4 - **-2**
> **4 - -2** =
> **6**
To support these different situations, we also provide `reduce-left` and
`reduce-right`.
(reduce-left - numbers) -23 (reduce-right - numbers) 3 ```
While these two procedures achieve the goal of systematically reducing a list of values by applying a binary procedure, they cannot be easily parallelized because we have chosen a particular sequence of operations.
map
with multiple listsWe’ve seen one way to use binary procedures with lists: We can reduce
a list of values to a single value by repeatedly combining pairs of
values with a function. But there’s another. Just as we can use map
to create a new list of values by applying a unary procedure to each
element of a list, we can also use a more generalized version of map
that grabs values from multiple lists and combines them into values
in a new list. In particular, map
can also build a new list by applying
the procedure to the corresponding elements of all the lists. For example,
> (map * (list 1 2 3) (list 4 5 6))
'(4 10 18) ; That's 1*4, 2*5, and 3*6
> (map + (list 1 2) (list 3 4) (list 5 6))
'(9 12)
> (map list (range 10) (map increment (range 10)) (map square (range 10)))
'((0 1 0) (1 2 1) (2 3 4) (3 4 9) (4 5 16) (5 6 25) (6 7 36) (7 8 49) (8 9 64) (9 10 81))
> (define first-names (list "Addison" "Bailey" "Casey" "Devon" "Emerson"))
> (define last-names (list "Smith" "Jones" "Smyth" "Johnson" "Doe"))
> (map (section string-append <> " " <>) first-names last-names)
'("Addison Smith" "Bailey Jones" "Casey Smyth" "Devon Johnson" "Emerson Doe")
> (map (section string-append <> ", " <>) last-names first-names)
'("Smith, Addison" "Jones, Bailey" "Smyth, Casey" "Johnson, Devon" "Doe, Emerson")
You may be starting to see some interesting possibilities. If you are not, stay tuned.
Racket comes with one more useful procedure, sort
, that puts the elements
of a list in an order you specify. The difficulty, of course, is how
to specify the order. For now, we’ll use four basic orderings.
(sort nums <)
sorts a list of real numbers from smallest to largest.(sort nums >)
sorts a list of real numbers from largest to smallest.(sort strings string-ci<?)
sorts a list of strings from
alphabetically first to alphabetically last.(sort strings string-ci>?)
sorts a list of strings from
alphabetically last to alphabetically first.For example,
> (sort (list 5 1 4 2 3) <)
'(1 2 3 4 5)
> (sort (list 5 1 4 2 3) >)
'(5 4 3 2 1)
> (sort (list "Computers" "are" "sentient" "and" "malicious") string-ci<?)
'("and" "are" "Computers" "malicious" "sentient")
> (sort (list "Computers" "are" "sentient" "and" "malicious") string-ci>?)
'("sentient" "malicious" "Computers" "are" "and")
There’s one more important set of list procedures for us to consider as
we explore the utility of lists. You’ve seen that the length
procedure
tells us how many values appear in a list. But what if we only want
to count some of the values in a list? We can use the (tally-value
lst val)
procedure.
> (tally-value (list "one" "and" "two" "and" "three") "and")
2
> (tally-value (list "one" "and" "two" "and" "three") "three")
1
> (tally-value (list "one" "and" "two" "and" "three") "five")
0
There’s also a procedure, (tally lst pred?)
, that takes a predicate
(a procedure that returns true/false) as its second parameter and
counts how many values meet that procedure.
> (tally (list 3 1 4 1 5 9 2) odd?)
5
> (tally (list 3 1 4 1 5 9 2) even?)
2
> (tally (list 3 1 "four" "one" 5 9 2) integer?)
5
> (tally (list 3 1 "four" "one" 5 9 2) string?)
2
We’ll return to predicates in a subsequent reading. For now, note that the predicate must be something we can apply to all elements of the list.
> (tally (list 3 1 "four" "one" 5 9 2) odd?)
Error! odd?: contract violation
Error! expected: integer
Error! given: "four"
Predict the results of evaluating each of the following expressions.
(list 2 1)
(make-list 1 2)
(make-list -1 2)
(map - (range 2))
(map - (range 2) (list 2 1))
(map range (list 2 1))
You may verify your predictions with DrRacket.
We came up with three different results for the expression (4 - 1 - 6 - 3 - 2 - 10 - 8). Come up with one or two more and show their derivation.
This reading is based closely on an earlier reading on lists from
a prior version of CSC
151.
Some aspects of that have been moved to the prior reading on lists
for this semester. This reading also
contains a new discussion of tally
and tally-value
.