Building Data Structures with Heterogeneous Lists

Summary: We delve more deeply into Scheme's list data type. We consider, in particular, how and why to make lists that contain different kinds of values.

Introduction: Representing Data

As we've said, computer scientists study both the algorithms we write to manipulate information and the ways in which we represent information. Since an emphasis of this course is images, we have been emphasizing kinds of data related to images. You've probably noticed that we have a number of ways to algorithmically describe images. That is, we can describe images in terms of a sequence of GIMP commands, in terms of a sequence of turtle commands, as a drawing built from the unit drawings, or as a function from positions to colors.

So far, we've been relying on the built-in implementation of these various techniques. However, we can, potentially, develop our own ways to describe images. We can then base the implementation of those new techniques on prior techniques. Suppose, for example, that all we had were the basic GIMP tool operations, and we wanted a way to represent collections of simple shapes (akin to the “drawings as values” model).

What values might we store internally for each shape? Certainly, we'd need to store the kind of shape (circle, square, rectangle, triangle, star, whatever). We would also want to store a color for the shape (red, green, blue, mauve, etc.). Unless we want to have all of our shapes the same size and position, we'll also need to store shape and position information. You'll note that we have a variety of kinds of information to store. The type of the shape is likely to be a string or a symbol. The color might be an RGB value or a string (or one of the few other representations of colors we have). The size and positions are likely to be integers (or perhaps real numbers).

In the past, when we've had a group of values we wanted to group together, we used lists. But all of those lists were “homogeneous” - each value in the list had the same type as the other values in the list. Now, we want to consider “heterogeneous” groups of values, groups in which different values may have different types. Fortunately, Scheme's list structure supports both homogeneous and heterogeneous collections. In fact, Scheme programmers regularly represent complex data types with multiple components as lists.

Describing Shapes

Let us return to the problem of describing basic shapes, akin to the basic components in the “drawings as values” model. We've noted that we want the following information for each shape:

the kind of the shape (circle, square, triangle, etc.), which we'll represent using a symbol;
the color of the shape, which we'll represent using a string;
the size of the shape, which we'll represent with a real number; and
the position of the shape, which we'll represent with two real numbers, representing the x and y value of the center.

For some shapes, we might keep additional information, such as the kind of triangle.

Given these basic decisions, we might represent some simple shapes as follows:

; A red circle of radius 10, centered at (5,2)
(circle "red" 10 5 2)
; A blue square of side-length 8, centered at (20,3)
(square "blue" 8 20 3)
; A yellow diamond with side-length 9, centered at (10,10)
(diamond "yellow" 9 10 10)

How do we create these lists? One way is to just use the list procedure.

> (define shape1 (list 'circle "red" 10 5 2))

But that strategy is inelegant and may be difficult to support if we decide to change our representation (e.g., to put the position ahead of the size). Hence, we should write wrapper procedures that take the appropriate inputs and place them into the form we want. For example,

;;; Procedure:
;;;   shape-circle
;;; Parameters:
;;;   color, a color
;;;   radius, a non-negative real number
;;;   x, a real number
;;;   y, a real number
;;; Purpose:
;;;   Creates a representation of the circle which can be rendered
;;;   later.
;;; Produces:
;;;   circle, a shape
;;; Preconditions:
;;;   [No additional]
;;; Postconditions:
;;;   circle, when rendered, has the given color and radius, and is
;;;   centered at (x,y).
(define shape-circle
  (lambda (color radius x y)
    (list 'circle (color->color-name color) radius x y)))

It seems to work okay.

> (shape-circle "red" 10 5 2)
(circle "red" 10 5 2)

Now, suppose we decide that it's better to make the size the last thing in the list (so that, for example, we could use the x and y as the starting coordinates of a line in some other kinds of shapes). We can leave the order of parameters for shape-circle the same, but change the way it constructs the list.

(define shape-circle
  (lambda (color radius x y)
    (list 'circle (color->color-name color) x y radius)))

Let's check to see how this new version works.

> (shape-circle "red" 10 5 2)
(circle "red" 5 2 10)
> (shape-circle (irgb 128 0 128) 100 30 30)
(circle "purple" 30 30 100)
> (shape-circle (irgb 128 128 128) 50 20 10)
(circle "gray" 20 10 50)

If we just look at the list, it is difficult to tell which number is which. Hence, when writing code, we'll have to be very careful to remember which value is in which position. That is: The first element is the symbol circle, the second is the color (as a string), the third is the x value, the fourth the y value, and the fifth the radius.

Rendering Shapes

Once we've created one of these shapes, how do we render it (that is, show it on an image)? We've started to look at that issue in the reading in conditionals. We look at the parts of the list and use them to choose what to select.

But how do we get the parts of a list? Scheme provides two basic operations for getting the parts of a list: car extracts the first element of a list and cdr extracts all but the first element of a list.

> (define rainbow (list "red" "orange" "yellow" "green" "blue" "indigo" "violet"))
> (car rainbow)
"red"
> (cdr rainbow)
("orange" "yellow" "green" "blue" "indigo" "violet")
> (car (cdr rainbow))
"orange"
> (car (cdr (cdr (cdr rainbow))))
"green"

As the last few examples suggest, we can get elements in the middle of a list by using cdr to drop the initial elements and then car to get the first remaining element.

If we're using five elements to represent a circle (the symbol circle, the color, the x coordinate of the center, the y coordinate of the center, and the radius), we can get the color from a shape called shape with (car (cdr shape)), the x coordinate with (car (cdr (cdr shape))), and so on and so forth.

We are now ready to render one kind of shape, the circle:

;;; Procedure:
;;;   image-render-shape!
;;; Parameters:
;;;   image, an image
;;;   shape, a shape
;;; Purpose:
;;;   Renders the shape on image
;;; Produces:
;;;   [Nothing; called for the side effect.]
;;; Preconditions:
;;;   shape was created by one of the shape procedures, such as
;;;   'shape-circle'.
;;; Postconditions:
;;;   image now contains a rendering of shape.
(define image-render-shape!
  (lambda (image shape)
    (cond
      ((equal? (car shape) 'circle)
       (let ([color (car (cdr shape))]
             [x (car (cdr (cdr shape)))]
             [y (car (cdr (cdr (cdr shape))))]
             [radius (car (cdr (cdr (cdr (cdr shape)))))])
         (context-set-fgcolor! color)
         (image-select-ellipse! image REPLACE
                                (- x radius)  ; left
                                (- y radius)  ; top
                                (* 2 radius)  ; width
                                (* 2 radius)) ; height
         (image-fill! image)
         (image-select-nothing! image)))
      (else
       (display "Unknown type of shape: ")
       (display shape)
       (newline)))))

That's not so bad, is it? Well, those nested calls to cdr can be a bit hard to follow. Hence, Scheme provides a list-ref procedure that we can use to get access to elements in the middle of a list. The list-ref procedure takes two arguments, the first of which is a list and the second a non-negative integer less than the length of the list. It recovers an element from the list by skipping over the number of initial elements specified by the second argument (applying cdr that many times) and extracting the next element (by invoking car). So (list-ref sample 0) is the same as (car sample), (list-ref sample 1) is the same as (car (cdr sample)), and so on.

> (define rainbow (list "red" "orange" "yellow" "green" "blue" "indigo" "violet"))
> (list-ref rainbow 1)
"orange"
> (list-ref rainbow 0)
"red"
> (list-ref rainbow 5)
"indigo"
> (list-ref rainbow 7)
list-ref: index 7 too large for list: ("red" "orange" "yellow" "green" "blue" "indigo" "violet")

Note: We tend to prefer to use car and cdr for the first few elements in a list, and list-ref for the subsequent elements.

Using list-ref, we might rewrite image-render-shape as follows.

(define image-render-shape!
  (lambda (image shape)
    (cond
      ((equal? (car shape) 'circle)
       (let ([color (car (cdr shape))]
             [x (list-ref shape 2)]
             [y (list-ref shape 3)]
             [radius (list-ref shape 4)])
         (context-set-fgcolor! color)
         (image-select-ellipse! image REPLACE
                                (- x radius)  ; left
                                (- y radius)  ; top
                                (* 2 radius)  ; width
                                (* 2 radius)) ; height
         (image-fill! image)
         (image-select-nothing! image)))
      (else
       (display "Unknown type of shape: ")
       (display shape)
       (newline)))))

We might even add instructions for rendering a square and diamond.

(define image-render-shape!
  (lambda (image shape)
    (cond
      ((equal? (car shape) 'circle)
       (let ([color (car (cdr shape))]
             [x (list-ref shape 2)]
             [y (list-ref shape 3)]
             [radius (list-ref shape 4)])
         (context-set-fgcolor! color)
         (image-select-ellipse! image REPLACE
                                (- x radius)  ; left
                                (- y radius)  ; top
                                (* 2 radius)  ; width
                                (* 2 radius)) ; height
         (image-fill! image)
         (image-select-nothing! image)))
      ((equal? (car shape) 'square)
       (let* ([color (car (cdr shape))]
              [x (list-ref shape 2)]
              [y (list-ref shape 3)]
              [side (list-ref shape 4)]
              [half-side (/ side 2)])
         (context-set-fgcolor! color)
         (image-select-rectangle! image REPLACE
                                  (- x half-side)  ; left
                                  (- y half-side)  ; top
                                  side             ; width
                                  side)            ; height
         (image-fill! image)
         (image-select-nothing! image)))
      ((equal? (car shape) 'diamond)
       (let* ([color (car (cdr shape))]
              [x (list-ref shape 2)]
              [y (list-ref shape 3)]
              [side (list-ref shape 4)]
              [offset (/ side (sqrt 2))])
         (context-set-fgcolor! color)
         (image-select-polygon! image REPLACE
                                (position-new x (- y offset))
                                (position-new (+ x offset) y)
                                (position-new x (+ y offset))
                                (position-new (- x offset) y))
         (image-fill! image)
         (image-select-nothing! image)))
      (else
       (display "Unknown type of shape: ")
       (display shape)
       (newline)))))

Changing Shapes

We now have techniques for making basic shapes and rendering those shapes. What if we want to scale or shift those shapes? Well, we know about the parts of a shape, and we know how to build new shapes, so we can use those techniques together. For example, to recolor a shape (or at least to recolor a shape that is one of the three basic shape types), we simply build a new list using the new color for the second element (that is, the element in position 1).

;;; Procedure:
;;;   shape-recolor
;;; Parameters:
;;;   shape, a shape
;;;   color, a color
;;; Purpose:
;;;   Change the color of the shape
;;; Produces:
;;;   recolored, a shape
(define shape-recolor
  (lambda (shape color)
    (cond
      ; For the basic shapes, we just need to change the color part
      ((or (equal? (car shape) 'circle)
           (equal? (car shape) 'square)
           (equal? (car shape) 'diamond))
       (list (car shape)
             (color->color-name color)
             (list-ref shape 2)
             (list-ref shape 3)
             (list-ref shape 4)))
      ; For everything else, we'll just leave it as is
      (else
       shape))))

We can use a similar technique to scale or shift a shape.

We can even write a procedure to convert a diamond to a simlarly-sized square by changing the first element of the list.

;;; Procedure:
;;;   diamond-to-square
;;; Parameters:
;;;   diamond, a shape 
;;; Purpose:
;;;   If diamond is a diamond, convert it to a square with the same 
;;;   center and side length.
;;; Produces:
;;;   square, a shape
;;; Postconditions:
;;;   If diamond is not a diamond, returns the original shape, rather
;;;   than a square.
(define diamond-to-square
  (lambda (shape)
    (cond
      ((equal? (car shape) 'diamond)
       (list 'square (car (cdr shape))
             (list-ref shape 2)
             (list-ref shape 3)
             (list-ref shape 4)))
      (else
       shape))))

However, this last procedure seems a bit inelegant, since we're only changing the first element of the list. Since we already know how to get the color, x, y, and size with cdr, is there a way to just tack square on to the front of that list? Yes. The cons procedure builds a new list by tacking a value on to the front of another list.

> (cons 5 (iota 3))
(5 0 1 2)
> (cons 'alpha (list 'beta 'gamma))
(alpha beta gamma)

Note that cons is a pure function. That is, (cons val lst) changes neither val nor lst. Rather, cons builds a new list whose first element is val and whose remaining elements correspond to the elements of lst.

Using cons, we can rewrite diamond-to-square a bit more concisely.

(define diamond-to-square
  (lambda (shape)
    (cond
      ((equal? (car shape) 'diamond)
       (cons 'square (cdr shape)))
      (else
       shape))))

Why is it called cons instead of list-prepend or something similar? Well, that's the name John McCarthy, the designer of Lisp, chose about fifty years ago. “cons” is short for construct, because cons constructs lists. (The custom of naming procedures with the basic type they operate on, a dash, and the key operation did not start until a few decades later.) The names car and cdr were chosen for very specific reasons that will not make sense for a few more weeks. For now, just accept that you're learning a bit of computer-ese.

Nested Lists

But what if you wanted to create compound shapes (just as we created compound drawings)? Well, in this case, rather than a color, center, and size, we need to store the individual shapes. Can we do that? In fact, a list can contain any types of Scheme values. In particular, lists can even contain other lists as values.

So, when we combine drawings, we can place them into a list and use a special symbol to indicate that it's a compound shape.

(define shape-compound
  (lambda (shapes)
    (list 'compound shapes)))

> (shape-compound (list (shape-circle "red" 10 5 5) 
                        (shape-square "blue" 20 6 6)))
(compound ((circle "red" 5 5 10) (square "blue" 6 6 20)))

We will, of course, need to update our other procedures to use this new representation. For example,

(define image-render-shape!
  (lambda (image shape)
    (cond 
      [(equal? (car shape) 'compound)
       (for-each (l-s image-render-shape! image) (car (cdr shape)))]
      ...)))
(define shape-recolor
  (lambda (shape color)
    (cond
      [(equal? (car shape) 'compound)
       (shape-compound (map (r-s shape-recolor) (car (cdr shape))))]
      ...)))

The technique of placing one list inside another is called nesting lists, and is useful in a wide variety of situations.

We are not limited to this simple nesting. We can nest lists within lists within lists within lists, as deep as we desire to go. Not all of our nested values need to be lists. We can, for example, make lists some of whose elements are lists, some of whose elements are strings, some of whose elements are numbers, and so on and so forth.

Building Lists, Revisited

You've now seen a variety of ways to build lists. You can use the list procedure. You can use the make-list procedure. You can use cons to prepend a value to a list. Suppose you prefer to build lists with cons. How can you get started, given that cons expects a list as one of its parameters? You start with the empty list.

Scheme's name for the empty list is a pair of parentheses with nothing between them: (). Most implementations of Scheme permit you to refer to that list as nil or null. A few permit you to type it verbatim. All permit you to describe the empty list by putting a single quote before the pair of parentheses.

> '()
()
> nil
()
> null
()

You will find that we prefer to use a name for that list. If sample code does not work in your version of Scheme, try inserting the following definitions.

(define nil '())
(define null '())

Note that by using cons and nil, we can build up a list of any length by starting with the empty list and repeatedly prepending a value.

> (define singleton (cons "red" null))
> singleton
("red")
> (define doubleton (cons "green" singleton))
> doubleton
("green" "red")
> (define tripleton (cons "yellow" doubleton))
> tripleton
("yellow" "green" "red")
> (cons "senior" (cons "third-year" (cons "second-year" (cons "freshling" null))))
("senior" "third-year" "second-year" "freshling")

You may note that lists built in this way seem a bit “backwards”. That is, the value we add last appears at the front, rather than at the back. However, that's simply the way cons works and, as the last example suggests, in many cases it is a quite sensible thing to do.

List Predicates

Scheme provides two basic predicates for checking whether a value is a list. The null? predicate checks whether a value is the empty list. The list? predicate checks whether a value is a list (empty or nonempty).

> (null? null)
#t
> (list? null)
#t
> (null? (list 1 2 3))
#f
> (list? (list 1 2 3))
#t
> (null? 5)
#f
> (list? 5)
#f

Other Common List Procedures

It turns out that you can build any other list procedure with just null, cons, car, cdr, null?, and some other programming techniques. Nonetheless, there are enough common operations that most programmers want to do with lists that Scheme includes them as basic operations. (That means you don't have to define them yourself.) Here are a few that programmers frequently use.

`length`

The length procedure takes one argument, which must be a list, and computes the number of elements in the list. (An element that happens to be itself a list nevertheless contributes 1 to the total that length computes, regardless of how many elements it happens to contain.)

> (length null)
0
> (length (list 1 2 3))
3
> (length (list (list 1 2 3)))
1

`append`

The append procedure takes any number of arguments, each of which is a list, and returns a new list formed by stringing together all of the elements of the argument lists, in order, to form one long list.

> (append (list "red" "green") (list "blue" "yellow"))
("red" "green" "blue" "yellow")

The empty list acts as the identity for append.

> (append null (list "blue" "yellow"))
("blue" "yellow")
> (append (list "red" "green") null)
("red" "green")
> (append null null)
()

`cadr` and company: Combining `car` and `cdr`

To reduce the amount of typing necessary for the programmer, many implementations of Scheme provide procedures that combine car and cdr in various ways. These procedures begin with the letter “c”, end with the letter “r” and have a sequence of “a”'s and “d”'s in the middle to indicate the sequence of calls to car (for an “a”) or cdr (for a “d”). For example, cadr computes the car of the cdr of a list (the second element), cddr computes the cdr of the cdr of a list (all but the first two elements), and caar computes the car of the car of a list (applicable only to nested lists).

> (define rainbow (list "red" "orange" "yellow" "green" "blue" "indigo" "violet"))
> (cadr rainbow)
"orange"
> (cddr rainbow)
("yellow" "green" "blue" "indigo" "violet")
> (caddr rainbow)
"yellow"
> (cdddr rainbow)
("green" "blue" "indigo" "violet")

Summary of New List Procedures

null: Standard list constant. The empty list.
(cons value lst): Standard List Procedure. Create a new list by prepending value to the front of lst.
(cdr lst): Standard List Procedure. Get a list the same as lst but without the first element.
(car lst): Standard List Procedure. Get the first element of lst.
(null? lst): Standard list predicate. Checks if lst is the empty list.
(list-ref lst n): Standard List Procedure. Get the nth element of lst. Note that elements are numbered starting at 0.
(length lst): Standard List Procedure. Return the number of elements of list lst.
(append lst_0 lst_1 ... lst_n): Standard List Procedure. Create a new list by concatenating the elements of lst_0, lst_1, ... lst_n.
(caar lst): Standard List Procedure. If lst's first element is a list, gets the first element of that first element, the the car of the car of lst. If lst is not a list, or its first element is not a list, reports an error.
(cadr lst): Standard List Procedure. Get the second element of lst, the car of the cdr of lst
(cddr lst): Standard List Procedure. Get all but the first two elements of lst, the cdr of the cdr of lst
(caddr lst): Standard List Procedure. Get the third element of lst, the car of the cdr of the cdr of lst.

Self Checks

Check 1: List Procedures

Predict the results of evaluating the following expressions.

(cons 2 null)
(cons 1 (cons 2 null))
(cons 5 (list 1 2))
(caddr (iota 7))
(list-ref 2 (iota 7))
(append (iota 2) (iota 2))
(list (iota 2) (iota 2))
(append (iota 2) null)
(list (iota 2) null)
(cons (iota 2) null)

You may verify your predictions using DrRacket, but be sure you understand the results.