Skip to main content

Processing homogeneous lists

Due
Monday, 11 February 2019
Summary
We continue our encounters with Scheme’s list data type. Lists collect values. We explore new ways to work with lists, including mechanisms for creating lists, applying procedures to all elements in a list, and combining the elements in a list.
Prerequisites
List basics. Procedures. Compose and section.

Introduction

In your initial explorations with Scheme you have investigated a variety of basic types of data, including numbers, strings, and symbols. You can work on many kinds of problems with just these types. However, when you want to address more complex problems, particularly problems from data science, you will need to work with collections of data - not just the rating of a movie from one newspaper, but the rating of that movie from many newspapers (or even the ratings of many movies from many newspapers).

In Scheme, the simplest mechanism for dealing with collections of data are lists. Lists are collections of values that you can process one-by-one or en masse. You’ve already explored some of the things we can do with lists, such as find their length and extract some elements. In this reading, we will further consider Scheme’s list data type as well as a variety of procedures to build and manipulate lists.

Review: List basics

To date, you’ve seen a fairly wide variety of things you can do with lists. You can build them with list and make-list. You can build a list of portions of a string with string-split and regexp-match*. You can extract portions with take, drop, and list-ref. You can combine lists with append. You can make lists of consecutive integers with range. You can change the order with reverse. You can find out where a value appears with index-of or indexes-of (which some of us would prefer to be called indices-of). If you’ve forgotten what these procedures do, you may want to review the prior reading on lists.

It turns out that there’s a wealth of other things we can do with lists beyond those basic operations.

Building new lists from old: The map procedure

So, what can you do with lists once you’ve created them? Build other lists, of course. The first way we’ll build lists from lists is with the (map proc lst) procedure, which creates a new list by applying the unary (one-parameter) procedure proc to each element of the list lst.

For example, if we want to find the lengths of a variety of strings, we can use map with string-length.

> (map string-length (list "Beware" "the" "jabberwock" "my" "son"))
'(6 3 10 2 3)

And, as we’ve already seen, we can use string-split to extract the words from a sentence.

> (map string-length (string-split "The jaws that bite the claws that catch"))
'(3 4 4 4 3 5 4 5)

On the math side, if we want a list of the squares of the first ten positive integers (and we’re too lazy to compute them by hand), we can use map to apply the sqr procedure to each element of the list of the first ten positive integers.

> (define sqr (lambda (x) (* x x))) ; Also already defined.
> (range 1 11)
'(1 2 3 4 5 6 7 8 9 10)
> (map sqr (range 1 11))
'(1 4 9 16 25 36 49 64 81 100)

We can also find out the square roots of those same ten numbers.

> (map sqrt (range 1 11))
'(1 1.4142135623730951 1.7320508075688772 
 2 2.23606797749979 2.449489742783178 
 2.6457513110645907 2.8284271247461903 3 

We can check those results by squaring them again.

> (map sqr (map sqrt (list 1 2 3 4 5 6 7 8 9 10)))
'(1 2.0000000000000004 2.9999999999999996 
  4 5.000000000000001 5.999999999999999 
  7.000000000000001 8.000000000000002 9 
 10.000000000000002)

Aren’t approximations wonderful? They get even more interesting when we start rounding.

> (map ceiling (map sqr (map sqrt (list 1 2 3 4 5 6 7 8 9 10))))
'(1 3.0 3.0 4 6.0 6.0 8.0 9.0 9 11.0)

What should you take away from this? First, anything you can do to a single value you can also do to all values in a list by using it with the map procedure. Second, we often want to do a sequence of operations to each value in the list.

Sequencing operations

As we just noted, we end up writing map a lot when we want to sequence operations. Is there a better strategy? Yes. As you may recall, the loudhum library provides a procedure that allows you to compose functions. What is composition? You may remember it from your algebra class. If f and g are functions, the composition of f and g, written f o g, is also a function that applies g to its parameter and then f to the g’s result..

In traditional notation, we would write

(f o g)(x) = f(g(x))

In Scheme notation, we write

((o f g) x) = (f (g x))

The cool thing about this compose function is that it can take lots of functions. However, as in the case of the traditional compose, it does them right to left. Hence, the expression we just wrote as

> (map ceiling (map sqr (map sqrt (list 1 2 3 4 5 6 7 8 9 10))))
'(1 3.0 3.0 4 6.0 6.0 8.0 9.0 9 11.0)

we can more easily write as

> (map (o ceiling sqr sqrt) (list 1 2 3 4 5 6 7 8 9 10))
'(1 3.0 3.0 4 6.0 6.0 8.0 9.0 9 11.0)

And that makes it easier for us to make the results exact, too.

> (map (o inexact->exact ceiling sqr sqrt) (list 1 2 3 4 5 6 7 8 9 10))
'(1 3 3 4 6 6 8 9 9 11)

Although experienced programmers usually prefer composition, less experienced programmers may find it more natural to use a lambda expression.

> (map (lambda (x) (inexact->exact (ceiling sqr (sqrt) x )))
       (list 1 2 3 4 5 6 7 8 9 10))
'(1 3 3 4 6 6 8 9 9 11)

We will ask you to explore both forms.

Combining the elements in a list

When we began our exploration of numbers, we used a variety of unary (one parameter) procedures, such as those above. But we also used some binary (two parameter) operations, such as addition or multiplication. Can we also use those with lists? It seems like we’d want to. For example, if we wanted to compute mean value in a collection of numbers, we want to add up all of the elements in the collection and then divide by the length of the collection.

We’ll start with a simple list of numbers, such as (list 4 1 6 3 2 10 5). We’d like to compute 4 + 1 + 6 + 3 + 2 + 10 + 5. The loudhum library provides a standard procedure, reduce, that does just that. In particular, (reduce FUN LST), converts LST to a single value by repeatedly applying FUN to neighboring pairs of values, replacing the pair with the result of the function.

> (require loudhum)
> (define numbers (list 4 1 6 3 2 10 8))
> (reduce + numbers)
34

Let’s see …

4+1 is 5. 6+3 is 9. 2+10 is 12. 5+9 is 14. 12+8 is 20. 14+20 is 34.

Yup.

Of course, we could also say

4+1 is 5. 5+6 is 11. 11+3 is 14. 14+2 is 16. 16+10 is 26. And 26+8 is 34.

That’s good. If it doesn’t matter what order we do the addition, we can choose whatever order is most efficient. (If we had lots and lots of numbers to add, it might be good to have different computers to add different subsets of the numbers and then add them back together at the end.) You’ll find that the same holds true for multiplication.

> (reduce * numbers)
11520

We can, of course, use reduce in many other ways. To find the largest value in the list, we reduce with max.

> (reduce max numbers)
10
> (reduce min numbers)
1

We can also use reduce, like map, with values other than numbers.

> (reduce string-append (list "one" "two" "three" "four" "five"))
"onetwothreefourfive"
> (map number->string (range 5))
'("0" "1" "2" "3" "4")
> (reduce string-append (map number->string (range 10)))
"0123456789"
> (string->number (reduce string-append (map number->string (range 10))))
123456789
> (sqrt (string->number (reduce string-append (map number->string (range 10)))))
11111.111060555555

Since reduce requires a binary procedure, we can’t use composition for the function. However, we can use a lambda expression.

> (reduce (lambda (x y) (string-append x " and " y))
          (map number->string (range 10)))
"0 and 1 and 2 and 3 and 4 and 5 and 6 and 7 and 8 and 9"

We can also use section.

> (reduce (section string-append <> " and " <>)
          (map number->string (range 10)))
"0 and 1 and 2 and 3 and 4 and 5 and 6 and 7 and 8 and 9"
> (reduce (section string-append <> " and " <>)
          (string-split "jubjub bird frumious bandersnatch"))
"jubjub and bird and frumious and bandersnatch"

We started this section by asking ourselves about computing the average of a list. We should know have the tools to do so.

Take a moment and think to yourself about how you would compute the average of the list of values in numbers.

Got it?

We were serious. Think about it.

Okay, here’s what we’d write.

> (/ (reduce + numbers) (length numbers))
4 6/7 ; or 34/7

Fairly simple, isn’t it? Computing the geometric mean is only a bit harder. (It’s okay if you don’t know what the geometric mean is; it’s a bit like the mean, except that we multiply the numbers together and then take the root of the product.)

> (expt (reduce * numbers) (/ 1 (length numbers)))
3.8037108643123165

You’ve seen some basic uses of reduce with lists. You will certainly discover many other applications of reduce.

Order of operations

Of course, we’re working with computers, which means that some things aren’t as simple as you might expect. Here’s one potential problem. We noted that reduce relies on our ability to combine neighboring pairs in any order. Are there operations in which the order in which you combine neighboring pairs matters? Certainly. Let’s consider subtraction, using the expression (4 - 1 - 6 - 3 - 2 - 10 - 5). Here’s one computation, in which we randomly choose which pair of numbers to use.

4 - 1 - 6 - 3 - 2 - 10 - 8 = 4 - 1 - 3 - 2 - 10 - 8

4 - 1 - 3 - 2 - 10 - 8 = 4 - 1 - 1 - 10 - 8

4 - 1 - 1 - 10 - 8 = 4 - 0 - 10 - 8

4 - 0 - 10 - 8 = 4 - 10 - 8

4 - 10 - 8 = 4 - 2

4 - 2 = 2

But that’s probably not what most of us would expect. Let’s see what the procedure does.

> (reduce - numbers)
20
> (reduce - numbers)
6
> (reduce - numbers)
28```

Ooh, that's not very good, is it.  We'd almost certainly prefer consistent 
results.

We might, perhaps, take a more systematic approach, either doing the
subtraction from left to right or from right to left.  We'll start by
working from left to right.

> **4 - 1** - 6 - 3 - 2 - 10 - 8  = **3* - 6 - 3 - 2 - 10 - 8

> **3 - 6** - 3 - 2 - 10 - 8  = **-3** - 3 - 2 - 10 - 8

> **-3 - 3** - 2 - 10 - 8  = **-6** - 2 - 10 - 8

> **-6 - 2** - 10 - 8  = **-8** - 10 - 8

> **-8 - 10** - 8 = **-18** - 8

> **-18 - 8** = **-26**

But let's also try working from right to left.

> 4 - 1 - 6 - 3 - 2 - **10 - 8** = 
> 4 - 1 - 6 - 3 - 2 - **2**

> 4 - 1 - 6 - 3 - **2 - 2** =
> 4 - 1 - 6 - 3 - **0**

> 4 - 1 - 6 - **3 - 0** =
> 4 - 1 - 6 - **3**

> 4 - 1 - **6 - 3** =
> 4 - 1 - **3**

> 4 - **1 - 3** =
> 4 - **-2**

> **4 - -2** =
> **6**

To support these different situations, we also provide `reduce-left` and
`reduce-right`.  

(reduce-left - numbers) -23 (reduce-right - numbers) 3 ```

While these two procedures achieve the goal of systematically reducing a list of values by applying a binary procedure, they cannot be easily parallelized because we have chosen a particular sequence of operations.

Using map with multiple lists

We’ve seen one way to use binary procedures with lists: We can reduce a list of values to a single value by repeatedly combining pairs of values with a function. But there’s another. Just as we can use map to create a new list of values by applying a unary procedure to each element of a list, we can also use a more generalized version of map that grabs values from multiple lists and combines them into values in a new list. In particular, map can also build a new list by applying the procedure to the corresponding elements of all the lists. For example,

> (map * (list 1 2 3) (list 4 5 6))
'(4 10 18) ; That's 1*4, 2*5, and 3*6
> (map + (list 1 2) (list 3 4) (list 5 6))
'(9 12)

> (map list (range 10) (map increment (range 10)) (map square (range 10)))
'((0 1 0) (1 2 1) (2 3 4) (3 4 9) (4 5 16) (5 6 25) (6 7 36) (7 8 49) (8 9 64) (9 10 81))

> (define first-names (list "Addison" "Bailey" "Casey" "Devon" "Emerson"))
> (define last-names (list "Smith" "Jones" "Smyth" "Johnson" "Doe"))
> (map (section string-append <> " " <>) first-names last-names)
'("Addison Smith" "Bailey Jones" "Casey Smyth" "Devon Johnson" "Emerson Doe")
> (map (section string-append <> ", " <>) last-names first-names)
'("Smith, Addison" "Jones, Bailey" "Smyth, Casey" "Johnson, Devon" "Doe, Emerson")

You may be starting to see some interesting possibilities. If you are not, stay tuned.

Putting lists in order

Racket comes with one more useful procedure, sort, that puts the elements of a list in an order you specify. The difficulty, of course, is how to specify the order. For now, we’ll use four basic orderings.

  • (sort nums <) sorts a list of real numbers from smallest to largest.
  • (sort nums >) sorts a list of real numbers from largest to smallest.
  • (sort strings string-ci<?) sorts a list of strings from alphabetically first to alphabetically last.
  • (sort strings string-ci>?) sorts a list of strings from alphabetically last to alphabetically first.

For example,

> (sort (list 5 1 4 2 3) <)
'(1 2 3 4 5)
> (sort (list 5 1 4 2 3) >)
'(5 4 3 2 1)
> (sort (list "Computers" "are" "sentient" "and" "malicious") string-ci<?)
'("and" "are" "Computers" "malicious" "sentient")
> (sort (list "Computers" "are" "sentient" "and" "malicious") string-ci>?)
'("sentient" "malicious" "Computers" "are" "and")

Counting values

There’s one more important set of list procedures for us to consider as we explore the utility of lists. You’ve seen that the length procedure tells us how many values appear in a list. But what if we only want to count some of the values in a list? We can use the (tally-value lst val) procedure.

> (tally-value (list "one" "and" "two" "and" "three") "and")
2
> (tally-value (list "one" "and" "two" "and" "three") "three")
1
> (tally-value (list "one" "and" "two" "and" "three") "five")
0

There’s also a procedure, (tally lst pred?), that takes a predicate (a procedure that returns true/false) as its second parameter and counts how many values meet that procedure.

> (tally (list 3 1 4 1 5 9 2) odd?)
5
> (tally (list 3 1 4 1 5 9 2) even?)
2
> (tally (list 3 1 "four" "one" 5 9 2) integer?)
5
> (tally (list 3 1 "four" "one" 5 9 2) string?)
2

We’ll return to predicates in a subsequent reading. For now, note that the predicate must be something we can apply to all elements of the list.

> (tally (list 3 1 "four" "one" 5 9 2) odd?)
Error! odd?: contract violation
Error!  expected: integer
Error!  given: "four"

Self checks

Check 1: Verifying list procedures

Predict the results of evaluating each of the following expressions.

(list 2 1)
(make-list 1 2)
(make-list -1 2)
(map - (range 2))
(map - (range 2) (list 2 1))
(map range (list 2 1))

You may verify your predictions with DrRacket.

Check 2: Inconsistent subtraction

We came up with three different results for the expression (4 - 1 - 6 - 3 - 2 - 10 - 8). Come up with one or two more and show their derivation.

Acknowledgements

This reading is based closely on an earlier reading on lists from a prior version of CSC 151. Some aspects of that have been moved to the prior reading on lists for this semester. This reading also contains a new discussion of tally and tally-value.