Functional Problem Solving (CSC 151 2013F) : Readings
Primary: [Front Door] [Schedule] - [Academic Honesty] [Disabilities] [Email] [FAQ] [IRC] [Teaching & Learning] [Grading]
Current: [Assignment] [EBoard] [Lab] [Outline] [Partners] [Reading]
Sections: [Assignments] [EBoards] [Examples] [Handouts] [Labs] [Outlines] [Partners] [Readings]
Reference: [Setup] - [Functions A-Z] [Functions By Topic] - [Racket] [Scheme Report (R5RS)] [R6RS] [TSPL4]
Related Courses: [Davis (2013F)] [Rebelsky (2010F)] [Weinman (2012F)]
Misc: [SamR] [Glimmer Labs] [CS@Grinnell] [Grinnell] [Issue Tracker (Course)]
Summary: As you develop procedures and collections of procedures, you have a responsibility to make sure that they work correctly. One mechanism for checking your procedures is a comprehensive suite of tests. In this reading, we consider the design and use of tests. We also consider RackUnit, the testing framework that comes with DrRacket.
Most computer programmers strive to write clear, efficient, and correct code. It is (usually) easy to determine whether code is clear. With some practice and knowledge of the correct tools, one can determine how efficient code is. However, believe it or not, it is often difficult to determine whether code is correct.
The gold standard of correctness is a formal proof that the procedure or program is correct. However, in order to prove a program or procedure correct, one must develop a rich mathematical toolkit and devote significant effort to writing the proof. Such effort is worth it for life-critical applications, but for many programs, the effort and time required are often more than can be reasonably expected.
There is also a disadvantage of formal proof: Code often changes and the proof must therefore also change. Why does code change? At times, the requirements of the code change (e.g., a procedure that was to do three related things is now expected to do four related things). At other times, with experience, programmers realize that they can improve the code by making a few changes. If we require that all code be proven correct, and if changing code means rewriting the proof, then we discourage programmers from changing their code.
Hence, we need other ways to have some confidence that our code is correct. A typical mechanism is a test suite, a collection of tests that are unlikely to all succeed if the code being tested is erroneous. One nice aspect of a test suite is that when you make changes, you can simply re-run the test suite and see if all the tests succeed. To many, test suites encourage programmers to experiment with improving their code, since good suites will tell them immediately whether or not the changes they have made are successful.
But when and how do you develop tests? These questions are the subject of this reading.
As the introduction suggested, you should write tests when you write
code. But what is a test? Put simply, a test is a bit of code that
reveals something about the correctness of a procedure or a set of
procedures. Most typically, we express tests in terms of expressions
and their expected values. For example, if we've written a procedure,
list-reverse, that reverses lists, we would
expect that
(list-reverse null) is null(list-reverse (list 3)) is equal to (list 3)(list-reverse (list 3 7 11)) is equal to (list 11 7
3)We could express those expectations in a variety of ways. The simplest strategy is to execute each expression, in turn, and see if the result is what we expected. You should have be using this form of testing regularly in your coding.
>(list-reverse null)()>(list-reverse (list 3))(3)>(list-reverse (list 3 7 11))(11 7 3)
Of course, one disadvantage of this kind of testing is that you have to manually look at the results to make sure that they are correct. You also have to know what the correct answers should be. But reading isn't always a good strategy. There's some evidence that you don't always catch errors when you have to do this comparison, particularly when you have a lot of tests. I know that I've certainly missed a number of errors this way. An appendix to this document presents an interesting historical anecdote about the dangers of writing a test suite in which you must manually read all of the results.
Since reading the results is tedious and perhaps even dangerous,
it is often useful
to have the computer do the comparison for you. For example, we might
write a procedure, check,
that checks to make sure that two expressions are equal.
(define check
(lambda (exp1 exp2)
(if (equal? exp1 exp2)
"OK"
"FAILED")))
We can then use this procedure for the tests above, as follows.
>(check (list-reverse null) null)"OK">(check (list-reverse (list 3)) (list 3))"OK">(check (list-reverse (list 3 7 11)) (list 11 7 3))"OK">(check (list-reverse (list 3 7 11)) (list 11 10 3))"FAILED"
Note that in the last test, the test itself, rather than
list-reverse, was incorrect.
Now, confirming that our code is correct is now simply a matter of
scanning through the results and seeing if any say "FAILED".
And, as importantly, we've specified the expected result along with each
expression, so we don't need to look it up manually.
Of course, there are still some disadvantages with this strategy. For
example, if we put the tests in a file to execute one by one, it may be
difficult to tell which ones failed. Also, for a large set of tests, it
seems a bit excessive to print OK every time. Finally, we get neither
OK
nor FAILED when there's an error in the original expression.
>(check (list-reverse 5) 5)car: expects argument of type <pair>; given 5
In fact, if an error occurs in the middle of a group of tests, the whole thing may come to a screeching halt.
To handle all of these additional issues, many program development environments now include some form of testing framework. And even when they don't, languages often have accompanying testing frameworks. While testing frameworks differ, they tend to share some commonalities.
First, most testing frameworks provide a way for you to
check expected results. That's a lot like
our check procedure above. That is, we have a value
we expect and an expression; we evaluate the expression; and
we compare the result to the expected value. We will refer
to procedures used to check expected results as “checks”.
Second, most testing frameworks provide a way to group
checks into tests. Why would we need more
than one check in a test? Sometimes, it's because we need to
check multiple things about a result. For example, if we want
to make sure that (iota 1) is '(0),
we might want to check that (a) the result has a length of 1
and (b) the car of the result is 0. But other
times, the tester feels it's natural to put a lot of related
checks into a single test - if any of them fail, the whole test
fails. How do you divide checks into tests? In some sense, it's
a matter of taste. Some testers like just a few checks per test.
Others like a lot.
Finally, most testing frameworks provide a way to group individual tests into test suites. Why do we need groups of tests? Because we often have multiple procedures to test, both individually and in groups, and we want to run all of the tests to ensure that everything works together. For example, we might provide a library of a variety of related functions and want to test the whole library en masse.
So, when you first encounter a new testing framework, you should ask yourself three questions: How do you check for expected results? How do you group checks into tests? And how do you group tests into suites? (You should also ask a few related questions, such as how you run the tests.)
The designers of the Racket programming language designed a testing framework to go with the language. They call this framework “RackUnit”. As you might expect, RackUnit framework is well integrated in DrRacket.
By default, RackUnit is not available to your program. (Why not? Because there are many libraries that Racket programmers might use, and the designers of Racket decided that the default should be to only include those that the programmer deems necessary. To load RackUnit, add the following to the top of the definitions pane.
(require rackunit) (require rackunit/text-ui)
RackUnit provides a variety of procedures that we can use to check results. Most of them take an expected result, an input expression, and an optional message.
(check-=
expression
expected
epsilon)
,
(check-=
expression
expected
epsilon
optional-message)
expressionexpected and and then compare them
for numeric equality (within epsilon).
If they are equal, do nothing. If the are not equal, print an
error message. If the optional message is included, also print
that message.
(check-equal?
expression
expected)
,
(check-equal?
expression
expected
optional-message)
expression and
expected and then compare them for
equality. If they are equal, do nothing. If the are not equal,
print an error message. If the optional message is included,
also print that message.
(check-not-equal?
expression
expected)
,
(check-not-equal?
expression
expected
optional-message)
expression and
expected and then compare them.
If they are not equal, do nothing. If the are equal,
print an error message. If the optional message is included,
also print that message.
Although we will typically put checks into tests, we can run them on their own. When they succeed, they print no result. When they fail, they print an error message. For example,
>(check-= 4 4 0)>(check-= 4 (* 2 2) 0 "two times two is four")>(check-= 2 (* (sqrt 2) (sqrt 2)) 0.00001 "sqrt 2 squared, approximate")>(check-= 2 (* (sqrt 2) (sqrt 2)) 0 "sqrt 2 squared, exact")--------------------FAILUREname: check-=location: (stdin #f #f 230 59)expression: (check-= 2 (* (sqrt 2) (sqrt 2)) 0)params: (2 2.0000000000000004 0)message: "sqrt 2 squared, exact"Check failure--------------------
We group checks into tests with test-case.
(test-case
description
check-1 ...
check-n)
Note that test-case runs the test immediately. Sometimes
that's useful; sometimes we'd like to build up a bunch of tests for
running later. And that's where test suites come into play.
(test-suite
description
check-or-test-1 ...
check-or-test-n)
Note that, unlike test-case, test-suite does
not run the tests. Instead, it builds a suite
that we can later run with run-tests. (Why make the
distinction? Sometimes it ends up being easier to have the tests
grouped so that you can easily redefine them.) So, whenever you make
a test suite, you'll probably have to name it with define.
Let's return to our reverse example and consider how we might put together our simple checks. Here's one approach. We can just type them directly.
>(check-equal? (list-reverse null) null)>(check-equal? (list-reverse (list 3)) (list 3))>(check-equal? (list-reverse (list 3 7 11)) (list 11 7 3))
But we're better off grouping them into a suite.
(define reverse-tests
(test-suite
"Tests of list-reverse"
(test-case
"empty list"
(check-equal? (list-reverse null) null))
(test-case
"singleton list"
(check-equal? (list-reverse (list 3)) (list 3)))
(test-case
"triplet"
(check-equal? (list-reverse (list 3 7 11)) (list 11 7 3))
We can then run the tests.
>(run-tests reverse-tests)3 success(es) 0 failure(s) 0 error(s) 3 test(s) run
What happens if a test fails? Let's insert a broken test to see.
(test-case "broken" (check-equal? 0 1))
>(run-tests reverse-tests)--------------------Tests of list-reverse > brokenbrokenFAILUREname: check-equal?location: rackunit.rkt:19:4actual: 0expected: 1. Check failure--------------------3 success(es) 1 failure(s) 0 error(s) 4 test(s) run
To many programmers, testing is much like documentation. That is, it's something you add after you've written the majority of the code. However, testing, like documentation, can make it much easier to write the code in the first place.
As we suggested in the reading on documentation, by writing documentation first, you develop a clearer sense of what you want your procedures to accomplish. Taking the time to write documentation can also help you think through the special cases. For some programmers, writing the formal postconditions can give them an idea of how to solve the problem.
If you design your tests first, you can accomplish similar goals. For example, if you think carefully about what tests might fail, you make sure the special cases are handled. Also, a good set of tests of the form “this expression should have this value” can serve as a form of documentation for the reader, explaining through example what the procedure is to do. There is even a popular style of software engineering, called test-driven development, in which you always write the tests first. (Test-driven development is a key part of Extreme Programming and a variety of so-called “agile software development strategies”.)
tally-value
Let's consider this test-first strategy as we attempt to
write a common procedure, (, which
counts the number of times that tally-value
val lst)val appears in
lst. What are some good tests?
tally-value returns
0 for the empty list.
tally-value
returns 1 for a singleton list in which the singleton element is
val.
tally-value returns
0 for a singleton list in which the singleton element is not
val.
tally-value returns 1
for length-three lists in which val is just
the first, just the last, or just the middle element.
tally-value returns
the appropriate count for lists that include multiple copies of
val.
tally-value returns 0 for
longer lists that do not include val.
We will create two files, tally-value.rkt,
which will contain the code and documentation
for tally-value, and
tally-value-test.rkt, which will
contain our testing code. After making an empty
tally-value.rkt, we create
tally-value-test.rkt.
;;; File:
;;; tally-value-test.rkt
;;; Summary:
;;; Unit tests for the tally-value procedure.
#lang racket
(require rackunit)
(require rackunit/text-ui)
(require "./tally-value.rkt")
(define tally-tests
(test-suite
"Tests of the tally-value procedure"
(test-case "null list"
(check-equal? (tally-value 5 null) 0))
(test-case "singleton list with value at head"
(check-equal? (tally-value 5 (list 5)) 1))
(test-case "singleton list without value"
(check-equal? (tally-value 5 (list 17)) 0))
(test-case "triplet list, value at front"
(check-equal? (tally-value 5 (list 5 17 6)) 1))
(test-case "triple list, value in middle"
(check-equal? (tally-value 5 (list 17 5 6)) 1))
(test-case "triple list, value at end"
(check-equal? (tally-value 5 (list 6 17 5)) 1))
(test-case "five-by-five"
(check-equal? (tally-value 5 (list 5 5 5 5 5)) 5))
(test-case "three of 5"
(check-equal? (tally-value 5 (list 5 17 5 17 5)) 3))
(test-case "two of 5"
(check-equal? (tally-value 5 (list 17 5 5 17 17)) 2))
(test-case "zero of 5"
(check-equal? (tally-value 5 (list 17 17 17 17 17)) 0))
))
Of course, we haven't defined tally-value yet, so we
don't expect the tests to work, but let's see what happens. As soon
as we try to run the program, we'll get an error.
tally-value-test.rkt:12:15: expand: unbound identifier in module in: tally-value
Yeah, given that we didn't bother to define
tally-value, we'd expect some errors. So, let's
start to define tally-value. Here's an incorrect
definition:
#lang racket
(provide tally-value)
;;; Procedure:
;;; tally-value
;;; Parameters:
;;; val, a Scheme value [Unverified]
;;; lst, a list of values [Unverified]
;;; Purpose:
;;; Counts the number of times val appears in lst.
;;; Produces:
;;; tally, an integer
(define tally-value
(lambda (val lst)
(if (eq? val (car lst))
1
0)))
Amazingly, this version passes several of our tests. When we run the test suite, we get the following output. (Portions elided for brevity.)
>(run-tests tally-tests)--------------------Tests of the tally-value procedure > null listnull listERRORcar: expects argument of type <pair>; given '()----------------------------------------Tests of the tally-value procedure > triple list, value in middletriple list, value in middleFAILUREname: check-equal?location: tally-value-test.rkt:22:14actual: 0expected: 1Check failure--------------------...--------------------Tests of the tally-value procedure > two of 5two of 5FAILUREname: check-equal?location: tally-value-test.rkt:30:14actual: 0expected: 2Check failure--------------------4 success(es) 5 failure(s) 1 error(s) 10 test(s) run
But the goal is not to pass some tests. The goal is to pass all of the tests. We clearly need to fix the code.
It turns out that thinking about the kinds of errors we got can help us
repair our code.
That first error is important. It indicates that we're taking the
car of an empty list. Hence, we need to test
for that case. What about the others? Most look like a case of us
forgetting to recurse. Let's try again:
(define tally-value
(lambda (val lst)
(if (null? lst)
0
(+ (if (= val (car lst)) 1 0)
(tally-value val (cdr lst))))))
What does our test suite say?
>(run-tests tally-tests)10 success(es) 0 failure(s) 0 error(s) 10 test(s) run
Wow! That's comforting. We seem to have written it correctly (or at
least correctly enough to pass our tests). However, some people might
find the formulation above confusing or inelegant, so we might try to
rewrite it. Instead of putting the addition within the
if, we'll have
a three-way cond.
(define tally-value
(lambda (val lst)
(cond
((null? lst)
0)
((= val (car lst))
1)
(else
(tally-value val (cdr lst))))))
What do the tests say?
>(run-tests tally-tests)--------------------Tests of the tally-value procedure > five-by-fivefive-by-fiveFAILUREname: check-equal?location: tally-value-test.rkt:26:14actual: 1expected: 5Check failure----------------------------------------Tests of the tally-value procedure > three of 5three of 5FAILUREname: check-equal?location: tally-value-test.rkt:28:14actual: 1expected: 3Check failure----------------------------------------Tests of the tally-value procedure > two of 5two of 5FAILUREname: check-equal?location: tally-value-test.rkt:30:14actual: 1expected: 2Check failure--------------------7 success(es) 3 failure(s) 0 error(s) 10 test(s) run
Ah! We've made a common mistake of reorganizing code: We've forgotten to make recursive calls in all the places where they are needed.
(define tally-value
(lambda (val lst)
(cond
((null? lst)
0)
((= val (car lst))
(+ 1 (tally-value val (cdr lst))))
(else
(tally-value val (cdr lst))))))
We cross our fingers and run the tests again.
>(run-tests tally-tests)10 success(es) 0 failure(s) 0 error(s) 10 test(s) run
Of course, we've left out some tests. What if val
is a string, a drawing, or even a list? We'll add a few more tests to
tally-value-test.scm to cover these cases. We
don't need as many tests for each, since we already know that the
overall structure works.
As you'll see in the laboratory, these tests may reveal fascinating new errors in the code. The lab will give you the opportunity to correct the errors.
Many of us are reminded of the need for unit testing by the following story by Doug McIlroy, posted to The Risks Digest: Forum on Risks to the Public in Computers and Related Systems:
Sometime around 1961, a customer of the Bell Labs computing center questioned a value returned by the sine routine. The cause was simple: a card had dropped out of the assembly language source. Bob Morris pinned down the exact date by checking the dutifully filed reversions tests for system builds. Each time test values of the sine routine (and the rest of the library) had been printed out. Essentially the acceptance criterion was that the printout was the right thickness; the important point was that the tests ran to conclusion, not that they gave right answers. The trouble persisted through several deployed generations of the system.
McIlroy, Doug (2006). Trig routine risk: An Oldie. Risks Digest 24(49), December 2006.
If, instead of a thick printout, Bell Labs had arranged for a count of successes and a list of failures, they (and their customers) would have have been in much better shape.
Primary: [Front Door] [Schedule] - [Academic Honesty] [Disabilities] [Email] [FAQ] [IRC] [Teaching & Learning] [Grading]
Current: [Assignment] [EBoard] [Lab] [Outline] [Partners] [Reading]
Sections: [Assignments] [EBoards] [Examples] [Handouts] [Labs] [Outlines] [Partners] [Readings]
Reference: [Setup] - [Functions A-Z] [Functions By Topic] - [Racket] [Scheme Report (R5RS)] [R6RS] [TSPL4]
Related Courses: [Davis (2013F)] [Rebelsky (2010F)] [Weinman (2012F)]
Misc: [SamR] [Glimmer Labs] [CS@Grinnell] [Grinnell] [Issue Tracker (Course)]
Samuel A. Rebelsky, rebelsky@grinnell.edu
Copyright (c) 2007-2013 Janet Davis, Samuel A. Rebelsky, and Jerod Weinman. (Selected materials are copyright by John David Stone or Henry Walker and are used with permission.)

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this
license, visit http://creativecommons.org/licenses/by-nc/3.0/
or send a letter to Creative Commons, 543 Howard Street, 5th Floor,
San Francisco, California, 94105, USA.