Mini-Project 4: Pattern matching

Summary
In this mini-project, you will reflect on pattern matching, developing your own simplified pattern matching procedure.
Collaboration
Each student should submit their own set of solutions to this project. You may certainly consult others in developing the project, although you should cite or acknowledge them if you have done so.

For this assignment, store your code in the file pattern-matching.rkt.

Please do not include any of the tests we’ve written, but feel free to include your own. (You should, of course, run any of the tests we’ve written. But elide them afterwards.)

Note: To help clarify things, I’ve updated the patterns to use join and seq rather than cons and list. If you’ve written things using cons and list, that’s fine.

Note: I’ve added additional tests to the end of the instructions. Make sure to run them before submitting your redos.


As you have seen in your explorations of the match keyword, being able to compare (match) one thing to another can prove very helpful in program design. It can also be helpful in analyzing texts and other structures. But how does matching work, particularly when it involves bindings? There are a number of strategies. We will consider some ones in this assignment.

A lot of pattern matching happens in the context of so-called “hierarchical expressions” or “nested expressions”. You’ve already seen a lot of nesting in Racket, most frequently of expressions within expressions (within expressions). It turns out that we often think of texts that way: words are nested in sentences, sentences in paragraphs, paragraphs, in sections, sections in chapters, and so on and so forth.

For convenience, we are going to stick to the world of Racket and the kind of nesting that happens in Racket. We’ll limit our world of Racket to strings, integers, and lists. And our goal will be to match match Racket expressions to patterns that also involve strings, integers, and lists. However, patterns will also include symbols that represent variables.

So let’s get started with some rules that indicate how you decide whether a pattern and a value match.

a. A variable matches any value. We will use symbols to represent patterns. For eample, the variable x matchs the integer 3, the string "three", the list '(1 2 3), and, well, anything else.

b. An integer matches the same integer. The integer 5 matches the integer 5. It does not match the integer 4. It does not match the string "5". It does not match the list ‘(5)`.

c. A string matches the same string. The string "Hello" matches the string "Hello". It does not match the string "hello". It does not match the list '(#\h #\e #\l #\l #\o).

d. The pattern (join P1 P2) matches a list of at least one element if P1 matches the car of the list and P2 matches the cdr of the list. (Sounds like recursion, doesn’t it?) The pattern (join 1 xs) matches nonempty lists that start with 1. The pattern (join x (join y z)) matches lists with at least two elements.

e. The pattern (seq) matches the empty list. This rule is just a special case of the next pattern.

f. The pattern (seq P1 P2 ... Pn) matches lists of length n, but only if P1, P2, … Pn match the corresponding elements of the list. For example, the pattern (seq 0 X 1) matches only lists of length three whose first element is 0 and whose last element is 1.

g. Non-patterns don’t match anything.

Part the first: A match? predicate

Document, write tests for, design, and implement a predicate (match? pattern value) that determines whether a pattern matches a value using the policies above.

You may not use the match keyword in writing match?.

Here’s some code to get you started. I would recommend that you comment out all but one or two failing tests during development, and gradually get your code to pass more and more tests.

Note that you can use the pair? predicate to determine if something is a non-empty list. (Please don’t ask why it’s called pair?. I would not have used that name.)

;;; (match? pattern value) -> ???
;;;    pattern : pattern?
;;;    value : any?
;;; ???
(define match?
  (lambda (pattern value)
    (cond 
      ; Variables match everything
      [(symbol? pattern)
       #t]
      ; Everything else currently fails to match
      [else
       #f])))

; Tests of variable patterns
(check-true (match? 'x 1) 
            "variables: match variable to integer")
(check-true (match? 'x "one") 
            "variables: match variable to string")
(check-true (match? 'x '()) 
            "variables: match variable to empty list")
(check-true (match? 'x '(1)) 
            "variables: match variable to nonempty list")

; Tests of number patterns
(check-true (match? 1 1) 
            "integers: match integer to same integer")
(check-false (match? 1 -1)
             "integers: match integer to other integer")
(check-false (match? 1 "1")
             "integers: match integer to string")
(check-false (match? 1 '())
             "integers: match integer to empty list")
(check-false (match? 1 '(1))
             "integers: match integer to singleton list")

; Common tests of string patterns
(check-true (match? "Hello" "Hello")
            "strings: match string to itself")
(check-false (match? "Hello" "hello")
             "strings: match string to alternate capitalization")
(check-false (match? "Hello" "Goodbye")
             "strings: match string to different string")
(check-false (match? "Hello" "")
             "strings: match string to empty string")
(check-false (match? "Hello" 1)
             "strings: match string to integer")

; Edge cases of string patterns
(check-true (match? "" "")
            "string/edge: match empty string to itself")
(check-false (match? "" "hello")
            "string/edge: match empty string to nonempty string")
(check-false (match? "" '())
            "string/edge: match empty string to empty list")
(check-false (match? "" 0)
            "string/edge: match empty string to zero")
(check-false (match? "" '(""))
            "string/edge: match empty string to singleto list")

; Empty list, v1
(check-true (match? '(seq) '())
            "empty/1: match empty list to empty list")
(check-true (match? '(seq) '())
            "empty/1: match empty list to empty list")
(check-false (match? '(seq) (list 1))
             "empty/1: match empty list to singleton list created with list")
(check-false (match? '(seq) (cons 1 null))
             "empty/1: match empty list to singleton list created with cons")
(check-false (match? '(seq) '(1))
             "empty/1: match empty list to singleton list created with tick")

; Simple pairs
(check-true (match? '(join x xs) (list 1))
            "simple join: match to singleton list")
(check-true (match? '(join x xs) (list 1 2 3))
            "simple join: match to longer list")
(check-true (match? '(join x xs) (cons 1 null))
            "simple join: match to singleton list created with cons")
(check-false (match? '(join x xs) '())
             "simple join: match simple cons to empty list")
(check-false (match? '(join x xs) "(cons x xs)")
             "simple join: match simple cons to string")

; Cons cells with set first element
(check-true (match? '(join 1 xs) (list 1))
            "join/1: match (join 1 xs) and singleton list of 1")
(check-true (match? '(join 1 xs) (list 1 2 3))
            "join/1: match (join 1 xs) and longer list starting with 1")
(check-false (match? '(join 1 xs) '())
             "join/1: match (join 1 xs) and empty list")
(check-false (match? '(join 1 xs) (list 2))
             "join/1: match (join 1 xs) and list starting with 2")
(check-false (match? '(join 1 xs) 1)
             "join/1: match (join 1 xs) and integer")
(check-false (match? '(join 1 xs) "1")
             "join/1: match (join 1 xs) and string")

; Singleton lists created with '(join x (seq))
(check-true (match? '(join x (seq)) (list 1))
            "singleton/1: match (join x (seq)) with (list 1)")
(check-true (match? '(join x (seq)) (list "hello"))
            "singleton/1: match (join x (seq)) with string list")
(check-true (match? '(join x (seq)) (cons 1 (list)))
            "singleton/1: match (join x (seq)) with cons list")
(check-false (match? '(join x (seq)) null)
             "singleton/1: match (join x (seq)) with empty list")
(check-false (match? '(join x (seq)) (list 1 2))
             "singleton/1: match (join x (seq)) with two-element list")
(check-false (match? '(join x (seq)) "hello")
             "singleton/1: match (join x (seq)) with string")

; Singleton lists created with `(seq x)`
(check-true (match? '(seq x) (list 1))
            "singleton/2: match (seq x) with (list 1)")
(check-true (match? '(seq x) (list "hello"))
            "singleton/2: match (seq x) with string list")
(check-true (match? '(seq x) (cons 1 (list)))
            "singleton/2: match (seq x) with cons list")
(check-false (match? '(seq x) null)
             "singleton/2: match (seq x) with empty list")
(check-false (match? '(seq x) (list 1 2))
             "singleton/2: match (seq x) with two-element list")
(check-false (match? '(seq x) "hello")
             "singleton/2: match (seq x) with string")

; Two-element lists created with `(seq x y)`
(check-true (match? '(seq x y) (list 1 2))
            "pair/1:a")
(check-true (match? '(seq x y) (list "hello" "goodbye"))
            "pair/1:b")
(check-false (match? '(seq x y) (list "hello" "and" "goodbye"))
            "pair/1:c")
(check-false (match? '(seq x y) (list "hello"))
            "pair/1:d")
(check-false (match? '(seq x y) (list))
            "pair/1:e")

; Two-element lists created with `(join x (join y (seq)))`
(check-true (match? '(join x (join y (seq))) (list 1 2))
            "pair/2:a")
(check-true (match? '(join x (join y (seq))) (list "hello" "goodbye"))
            "pair/2:b")
(check-false (match? '(join x (join y (seq))) (list "hello" "and" "goodbye"))
            "pair/2:c")
(check-false (match? '(join x (join y (seq))) (list "hello"))
            "pair/2:d")
(check-false (match? '(join x (join y (seq))) (list))
            "pair/2:e")

; Two-element lists created with `(seq "a" y)`
(check-true (match? '(seq "a" y) (list "a" "b"))
            "pair/3:a")
(check-true (match? '(seq "a" y) (list "a" 2))
            "pair/3:b")
(check-false (match? '(seq "a" y) (list "b" "a"))
            "pair/3:c")
(check-false (match? '(seq "a" y) (list "a" "b" "c"))
            "pair/3:d")
(check-false (match? '(seq "a" y) (list "a"))
            "pair/3:d")
(check-false (match? '(seq "a" y) (list))
            "pair/3:e")

; Two-element lists created with `(seq "a" y)`
(check-true (match? '(seq "a" y) (list "a" "b"))
            "pair/3:a")
(check-true (match? '(seq "a" y) (list "a" 2))
            "pair/3:b")
(check-false (match? '(seq "a" y) (list "b" "a"))
            "pair/3:c")
(check-false (match? '(seq "a" y) (list "a" "b" "c"))
            "pair/3:d")
(check-false (match? '(seq "a" y) (list "a"))
            "pair/3:d")
(check-false (match? '(seq "a" y) (list))
            "pair/3:e")

; A few longer lists
(check-true (match? '(seq 1 2 3) (list 1 2 3))
            "len3:a Matching length-three lists")
(check-true (match? '(seq x 2 3) (list 1 2 3))
            "len3:b Matching length-three lists with a variable in the pattern")
(check-true (match? '(seq x y z) (list 1 2 3))
            "len3:c Matching length-three lists with all variables in the pattern")
(check-false (match? '(seq 1 2 3) (list 1 1 1))
             "len3:d Non-matching length-three lists with the same first element")
(check-false (match? '(seq 1 2 3) (list 2 2 3))
             "len3:e Non-matching length-three lists with the different first elements")
(check-false (match? '(seq x y z) (list 1 2 3 4))
             "len3:f Lists of different length")

; Some edgier things, often with deeper nesting
(check-true (match? '(seq (seq (seq (seq x y)))) 
                    (list (list (list (list 1 2)))))
            "edgier/1:a")
(check-false (match? '(seq (seq (seq (seq x y)))) 
                     (list (list (list (list 1 2 3)))))
             "edgier/1:b")
(check-false (match? '(seq (seq (seq (seq x y)))) 
                     (list (list (list (list 3)))))
             "edgier/1:c")
(check-false (match? '(seq (seq (seq (seq x y)))) 
                     (list (list (list 3))))
             "edgier/1:d")
(check-true (match? '(join 2 (join x y)) '(2 4 5))
            "edgier/1:e")

; Additional student-supplied tests
; Forthcoming

Part the second: Finding matches

Document, write tests for, design, and implement a procedure, (find-match pattern value) that looks for and returns something in value that matches patterns. If nothing in value matches pattern, it should return false (#f).

For example,

; A list of two elements whose first element is 1
> (find-match '(seq 1 x)
              '((1 2 3) (2 3) (2 1) (1 4) 5))
'(1 4)
; A list of two elements whose first element is 1
> (find-match '(seq 1 x)
              '(((1 2) 3) (2 3) (2 1) (1 4) 5))
'(1 2)
; A non-empty list whose first element is 2
> (find-match '(join 2 x)
              '(((1 2) 3) (2 3) (2 1) (1 4) 5))
'(2)
; A list of at least two elements whose first element is 2
> (find-match '(join 2 (join x y))
              '(((1 2) 3) (2 3) (2 1) (1 4) 5))
'(2 3)
; A list of at least two elements whose first element is 2
> (find-match '(join 2 (join x y))
              '(((1 2) 3) (8 2 4 5) (2 3) (2 1) (1 4) 5))
'(2 4 5)

Your strategy will be something like the following.

  • If pattern matches value, return value.
  • Otherwise, if value is not a list, return false (#f).
  • Otherwise, if pattern matches something in the car of value, return the thing that matches.
  • Otherwise, if pattern matches something in the cdr of value, return the thing that matches.
  • Otherwise, return false (#f).

You may not use the match keyword in writing find-match. However, you can certainly use the match? predicate you just wrote.

Part the third: Bindings (Optional)

This part is optional. It is not necessary to complete the third part in order to earn an E. But you might find yourself understanding both match and recursion even better if you complete it. Perhaps you’ll even find it fun.

You may recall that Racket’s match keyword not only looks for matches, it also binds values to variables whenever it matches a variables.

Document, write tests for, design, and implement a procedure, (bindings pattern value) that takes a pattern and a matching value and returns a list of (variable val) bindings for the variables in pattern. For example,

> (bindings 'x 1)
'((x 1))

> (bindings 'variable "value")
'((variable "value"))

> (bindings '(seq variable) (list "value"))
'((variable "value"))

> (bindings '(join x y) (list "value"))
'((x "value") (y ()))

> (bindings '(join x y) (list 1 2 3))
'((x 1) (y (2 3)))

> (bindings '(seq x y z) (list "a" "b" "c"))
'((x "a") (y "b") (z "c"))

If the value does not match, your procedure can return whatever you want.

You may not use the match keyword in writing bindings. However, you can certainly use the match? predicate you just wrote. (You may not need to.)

Hint: The structure of bindings will be much like the structure of match, except that you will be returning lists of bindings rather than true or false.

(define bindings
  (lambda (pattern value)
    (cond 
      ; Whoo!  We found a binding
      [(symbol? pattern)
       (list (list pattern value))]
      ; Base case of integers or strings.  No bindings.
      [(or (integer? pattern) (string? pattern))
       '()]
      ; The rest remains to be written
      [else
       '()])))

Frequently Asked Questions

Why do we put a tick mark before patterns?

Because we don’t want DrRacket to evaluate them. The patterns are supposed to remain as is.

Why don’t we put tick marks within the patterns?

The policy in DrRacket is that one tick suffices for the whole thing. In fact, it will misinterpret what you havve if you put ticks inside. You’ll note that DrRacket does the same when it presents lists back to you.

(list (list) (list 1))
'(() (1))

I’m not quite sure how to deal with the join patterns.

Work on the seq patterns first. Once you are able to make those match, you’ll be better able to work on (or at least discuss) the join patterns.

I’m not quite sure how to deal with the seq patterns.

Work on the join patterns first. Once you are able to make those match, you’ll be better able to work on (or at least discuss) the seq patterns.

Why does '(join x (seq)) match (list 1)?

We can think of lists in two ways: As lists, or as things built from cons (which therefore can match join). The two models are interchangeable. So any list can be deconstructed into the first element and the rest with the join pattern.

'(seq) matches the empty list.

So '(join x (seq)) matches a one-element list.

(list 1) is a one-element list.

Since 1 matches x, the list matches the pattern.

Why does '(join x xs) match (list 1 2 3)?

'(join x xs) matches a list whose first element matches x and whose remainder matches xs.

The value 1 matches x. The list '(2 3) matches xs.

How can I match the empty list pattern, '(seq)?

First, you need to make sure that you have the right pattern. The pattern (not the thing we’re matching) is a list with one element, the symbol 'seq. We can check if the thing we’re matching against is a list using null?.

The portion of the code would look something like this.

; Empty lists
[(and (pair? pattern) ; It's a list
      (null? (cdr pattern)) ; Containing one element
      (eq? (car pattern 'seq))) ; Which is the symbol `'seq`
 ; Then it matches only the empty list
 (null? value)]

Any other hints on matching lists created with seq?

You might write a helper that takes a list of patterns and a list of values as parameters, and make sure that each pattern matches the corresponding value.

Any other general hints?

For each pattern, I’d separate the question of “Do I have this pattern?” from that of “Do I match this pattern?”. That is, your guard will be “Is it this kind of pattern?” and your consequent will be code that checks to see whether the value matches that particular pattern.

Is there a way that I can check what’s being matched?

Sure. Try the following.

First, add this procedure.

;;; (describe-match pattern value) -> void?
;;;   pattern : any?
;;;   value : any?
;;; Print a short description of an attempted match.
(define describe-match
  (lambda (pattern value)
    (display "Matching ")
    (write pattern)
    (display " against ")
    (write value)
    (newline)))

Second, add a call to describe-match to the top of match?.

(define match?
  (lambda (pattern value)
    (describe-match pattern value)
    (cond 
      ; Variables match everything
      [(symbol? pattern)
       #t]
      ...)))

Warning! You’ll get a lot of output if you run all of the tests. You may want to comment them out before doing this.

Additional tests

I wrote these while grading.

Additional checks of match?

(check-false (match? '(join 1 2) '(1 3))
             "Make sure we check the second element of joins!")
(check-true (match? '(join 1 (seq a b)) '(1 2 3))
            "Another three-element list")
(check-false (match? '(seq 1 2 3 4) '(1 2 3))
             "almost matching lists; pattern is too long")
(check-false (match? '(seq 1 2 3) '(1 2 3 4))
             "almost matching lists; pattern is too long")
(check-false (match? '(1 2) '(1 2))
             "invalid pattern: list")
(check-false (match? 1/2 0.5)
             "invalid pattern: fraction")
(check-false (match? '(x y z) '(1 2))
             "invalid pattern: list1")
(check-false (match? '(x y z) '(1 2 3))
             "invalid pattern: list2")
(check-false (match? '(join 1) '(1))
             "invalid pattern: join with only one constant parameter")
(check-false (match? '(join x) '(1))
             "invalid pattern: join with only one variable parameter")
(check-false (match? '() '())
             "invalid pattern: empty list")
(check-false (match? '(join 1 2) '(1 2))
             "strange join pattern")
(check-false (match? '(seq 1 2 3) '(seq 1 2 3))
             "seq in the value")
(check-false (match? '(seq 1 2 3 4 5 6 7 8 9) '(1 2 3 4 5 6 7 8 8))
             "long almost-matching sequence")
(check-false (match? '(seq 1 2 3 4 5 6 7 8 9) '(1 2 3 4 6 6 7 8 9))
             "another long almost-matching sequence")
(check-true (match? '(seq 1 2 3 4 5 6 7 8 9) '(1 2 3 4 5 6 7 8 9))
            "long matching sequence")

Additional checks of find-match

(check-equal? (find-match 5 '(((5))))
              5
              "constant in deeply nested list")
(check-equal? (find-match 6 '(2 (3 (6) 7)))
              6
              "constant in deeply nested list")
(check-equal? (find-match 'x '())
              '()
              "Edge case: Variable to null")
(check-equal? (find-match 'x '(1))
              '(1)
              "Normal case: Variable to singleton list")
(check-equal? (find-match '(seq) '())
              '()
              "Edge case: Empty list")
(check-equal? (find-match '(seq x y z) '((1 2) (3 (4 5 6))))
              '(4 5 6)
              "Longer list")
(check-equal? (find-match '(seq x y z) '(3 (4 5 6)))
              '(4 5 6)
              "Part of longer list")
(check-equal? (find-match '(join 1 x) '(((1 2)) 1 3))
              '(1 2)
              "Multiple matches, should find first (v1)")
(check-equal? (find-match '(join 1 x) '(((1 2)) 1))
              '(1 2)
              "Multiple matches, should find first (v2)")
(check-equal? (find-match '(seq x y z) '(((1 2) (3 (4 5 6))) 7))
              '(4 5 6)
              "Extended longer list")
(check-false (find-match 3 '(() ((1 2) (1 (() 1 2 1))) 1))
             "Non-match of 3 in strange nested list")
(check-false (find-match '(seq 1 1) '(((1 2) (1 (1 2 1))) 1))
             "Non-match of (seq 1 1)")
(check-equal? (find-match '(seq 1 x) '((1 2) (2 3) (1 4)))
              '(1 2)
              "match at front")
(check-equal? (find-match '(seq 1 x) '((2 2) (2 3) (1 4)))
              '(1 4)
              "match at end")

Optional tests

Now that we’ve learned that cons need not take a list as a second parameter, we might extend the join pattern to accept pairs rather than lists for the value. Here’s a simple test of that.

(check-true (match? '(join 1 2) (cons 1 2))
            "A simple non-list join/cons")