Fundamentals of Computer Science I (CS151.02 2007S)
[Skip to Body]
Primary:
[Front Door]
[Syllabus]
[Glance]
[Search]
-
[Academic Honesty]
[Instructions]
Current:
[Outline]
[EBoard]
[Reading]
[Lab]
[Assignment]
Groupings:
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Projects]
[Readings]
Reference:
[Scheme Report (R5RS)]
[Scheme Reference]
[DrScheme Manual]
Related Courses:
[CSC151 2006F (Rebelsky)]
[CSC151.01 2007S (Davis)]
[CSCS151 2005S (Stone)]
This reading is also available in PDF.
Summary: As you develop procedures (and collections of procedures), you have a responsibility to make sure that they work correctly. One mechanism for checking your procedure is a comprehensive suite of tests. In this reading, we consider the design and use of tests.
Contents:
tally
Most computer programmers strive to write clear, efficient, and correct code. It is (usually) easy to determine whether or not code is clear. With some practice and knowledge of the correct tools, one can determine how efficient code is. However, it is often difficult to determine whether or not code is correct.
The gold standard of correctness is, of course, a formal proof that the procedure or program is correct. However, in order to prove a program or procedure correct, one must develop a rich mathematical toolkit and devote significant effort to writing the proof. Such effort is worth it for life-critical applications, but for many programs, it is often more than can be reasonably expected.
There is also a disadvantage of formal proof: Code often changes and the proof must therefore also change. Why does code change? At times, the requirements of the code change (e.g., a procedure that was to do three related things is now expected to do four related things). At other times, with experience, programmers realize that they can improve the code by making a few changes. If we require that all code by proven correct, and if changing code means rewriting the proof, then we discourage programmers from changing their code.
Hence, we need other ways to have some confidence that our code is correct. A typical mechanism is a test suite, a collection of tests that are unlikely to all succeed if the code being tested is erroneous. One nice aspect of a test suite is that when you make changes, you can simply re-run the test suite and see if all the tests succeed. To many programmers, test suites encourage programmers to experiment with improving their code, since good suites will tell them immediately whether or not the changes they have made are successful.
But when and how do you develop tests? These questions are the subject of this reading.
So, you should write tests when you write code. But what is a test? Put
simply, a test is a bit of code that reveals something about the correctness
of a procedure (or a set of procedures). Most typically, we express
tests in terms of expressions and their expected values. For example,
if we've written a procedure, rev
, that reverses lists,
we would expect that
(rev null)
is null
(rev (list 'a))
is (list 'a)
(rev (list 'a 'b 'c))
is (list 'c 'b 'a)
We could express those expectations in a variety of ways. The simplest strategy is to execute each expression, in turn, and see if the result is what we expected. You should have used this form of testing regularly in your coding.
> (rev null) null > (rev (list 'a)) (a) > (rev (list 'a 'b 'c)) (c b a)
Of course, one disadvantage of this kind of testing is that you have to manually look at the results to make sure that they are correct. There's some evidence that you don't always catch errors when you have to do this comparison, particularly when you have a lot of tests. I know that I've certainly missed a number of errors this way. An appendix to this document presents an interesting historical anecdote about the dangerous of testing in which you must read the result.
Hence, it is often useful to have the computer do the comparison for you.
For example, we might write a procedure, check
, that checks
to make sure that two expressions are equal.
(define check (lambda (exp1 exp2) (if (equal? exp1 exp2) "OK" "FAILED")))
We can then use this procedure for the tests above, as follows.
> (check (rev null) null) "OK" > (check (rev (list 'a)), (list 'a)) "OK" > (check (rev (list 'a 'b 'c)) (list 'c 'b 'a)) "OK" > (check (rev (list 'a 'b 'c)) (list 'c 'd 'a)) "FAILED"
(Note that in this case, the last test was incorrect.)
Confirming that our code is correct is now simply a matter of scanning
through the results and seeing if any say "FAILED"
.
Of course, there are still some disadvantages with this strategy. For example, if we put the tests in a file to execute one by one, it may be difficult to tell which ones failed. Also, for a large set of tests, it seems a bit excessive to print OK every time. Finally, we get neither OK nor FAILED when there's an error in the original expression.
To handle all of these additional issues, many program development environments now include some form of testing environment, in which you specify a sequence of tests and receive quick summaries of the results of the tests. For example, a testing environment, given the four tests for the previous section, might report something like:
4 tests conducted. One test failed. No errors encountered. One other test failed to give the expected result: For (rev (list 'a 'b 'c)), Expected [(c d a)], got [(c b a)]. Sorry. You'll need to fix your code.
Different testing environments are designed differently. For this class,
we have developed a simple testing environment that emphasizes
testing for expected outcomes. The environment is provided in the
file
/home/rebelsky/Web/Courses/CS151/2007S/Examples/unit-test.ss
and you load it as you would any other Scheme file.
(load "/home/rebelsky/Web/Courses/CS151/2007S/Examples/unit-test.ss")
Once you've loaded the environment, you begin a series of tests with
the begin-tests!
procedure. That procedure takes no
parameters. When you are done with a sequence of tests, you invoke
the end-tests!
procedure. That procedure reports on the
results of the tests.
The tests themselves might seem a bit strange at first glance. For
each test, you call the test!
procedure with two parameters,
(1) the expression you want to evaluate and (2)
an expression that gives the value you expect. For example, we would
express the first three (correct) tests of rev
as follows.
(test! (rev null) null) (test! (rev (list 'a)) (list 'a)) (test! (rev (list 'a 'b 'c)) (list 'c 'b 'a))
The extra quote at the beginning of the expression to be tested lets
test!
have access to the unevaluated expression, which can
be useful for printing error messages.
We might also want to test whether an expression causes an error. For
example, we would expect (my-reverse 'blah)
to give an
error because 'blah
is a symbol and not a list. The
test-error!
procedure lets us check whether an expression
we expect to give an error actually gives that error.
(test-error! (my-reverse 'blah))
To many programmers, testing is much like documentation. That is, it's something you add after you've written the majority of the code. However testing, like documentation, can make it much easier to write the code in the first place.
As we suggested in the reading on preconditions, postconditions, and documentation, by writing documentation first, you develop a clearer sense of what you want your procedures to accomplish. Taking the time to write documentation can also help you think through the special cases. For some programmers, writing the formal postconditions can give them an idea of how to solve the problem.
If you design your tests first, you can accomplish similar goals.
For example, if you think carefully about what tests might fail,
you make sure the special cases are handled. Also, a
good set of tests of the form this expression should have this
value
, can serve as a form of documentation for the reader,
explaining through example what the procedure is to do. There is
even a style of software engineering, called test-driven development,
in which you write tests first.
tally
Let's consider this test-first strategy as we attempt to write a
common procedure, (tally val lst)
,
which counts the number of times that val appears in
lst. What are some good tests?
tally
returns 0 for the empty list.
tally
returns 1 for a singleton list
in which the singleton element is val.
tally
returns 0 for a singleton list
in which the singleton element is not val.
tally
returns 1 for length-three lists
in which val is just the first, just the last, or just the middle
element.
tally
returns the appropriate count for lists that include multiple copies of val.
tally
returns 0 for longer lists that do not include val
.
So, we create two files, tally.ss
, which will contain the
code and documentation for tally
, and
tally-test.ss
, which will contain our testing code. We
create tally-test.ss
first.
(load "/home/rebelsky/Web/Courses/CS151/2006F/Examples/unit-test.ss") (load "tally.ss") (begin-tests!) (test! (tally 5 null) 0) (test! (tally 5 (list 5)) 1) (test! (tally 5 (list 17)) 0) (test! (tally 5 (list 5 17 6)) 1) (test! (tally 5 (list 17 5 6)) 1) (test! (tally 5 (list 6 17 5)) 1) (test! (tally 5 (list 5 5 5 5 5)) 5) (test! (tally 5 (list 5 17 5 17 5)) 3) (test! (tally 5 (list 17 5 5 17 17)) 2) (test! (tally 5 (list 17 17 17 17 17)) 0) (end-tests!)
Of course, we haven't defined tally
yet, so we don't expect
the tests to work, but let's see what happens.
10 tests conducted. 10 tests failed. 10 errors encountered The expression (tally 5 null) failed to evaluate because [reference to undefined identifier: tally] The expression (tally 5 (list 5)) failed to evaluate because [reference to undefined identifier: tally] The expression (tally 5 (list 17)) failed to evaluate because [reference to undefined identifier: tally] The expression (tally 5 (list 5 17 6)) failed to evaluate because [reference to undefined identifier: tally] The expression (tally 5 (list 17 5 6)) failed to evaluate because [reference to undefined identifier: tally] The expression (tally 5 (list 6 17 5)) failed to evaluate because [reference to undefined identifier: tally] The expression (tally 5 (list 5 5 5 5 5)) failed to evaluate because [reference to undefined identifier: tally] The expression (tally 5 (list 5 17 5 17 5)) failed to evaluate because [reference to undefined identifier: tally] The expression (tally 5 (list 17 5 5 17 17)) failed to evaluate because [reference to undefined identifier: tally] The expression (tally 5 (list 17 17 17 17 17)) failed to evaluate because [reference to undefined identifier: tally] No other tests failed to give the expected result. Sorry. You'll need to fix your code.
Yeah, we'd expect some errors. So, let's start to define tally
.
Here's an incorrect definition:
(define tally (lambda (val lst) (if (eq? val (car lst)) 1 0)))
Amazingly, this passes several of our tests. When we run the tests, here is the output.
10 tests conducted. 6 tests failed. One error encountered. The expression (tally 5 null) failed to evaluate because [car: expects argument of type <pair>; given ()] 5 other tests failed to give the expected result. For (tally 5 (list 17 5 6)) expected [1] got [0] For (tally 5 (list 6 17 5)) expected [1] got [0] For (tally 5 (list 5 5 5 5 5)) expected [5] got [1] For (tally 5 (list 5 17 5 17 5)) expected [3] got [1] For (tally 5 (list 17 5 5 17 17)) expected [2] got [0] Sorry. You'll need to fix your code.
That first error is important. It indicates that we're taking the car
of an empty list. The others are probably a case of us forgetting to recurse. Let's try again
(define tally (lambda (val lst) (if (null? lst) 0 (+ (if (eq? (car lst) val) 1 0) (tally val (cdr lst))))))
What does our test suite say?
10 tests conducted. No errors encountered. No tests failed. CONGRATULATIONS! All tests passed.
Wow. That's comforting. We seem to have written it correctly (or at least correctly enough to pass our tests). However, some people might find the formulation above confusing or inelegant, so we might try to rewrite it. Instead of putting the addition within the if, we'll have a three-way cond.
(define tally (lambda (val lst) (cond ((null? lst) 0) ((eq? val (car lst)) 1) (else (tally val (cdr lst))))))
What do the tests say?
10 tests conducted. 3 tests failed No errors encountered. 3 other tests failed to get the expected result: For (tally 5 (list 5 5 5 5 5)) expected [5] got [1] For (tally 5 (list 5 17 5 17 5)) expected [3] got [1] For (tally 5 (list 17 5 5 17 17)) expected [2] got [1] Sorry. You'll need to fix your code.
Ah! We've made a common mistake of reorganizing code: We've kept the recursive call in one place, but we need it in more than one place.
(define tally (lambda (val lst) (cond ((null? lst) 0) ((eq? val (car lst)) (+ 1 (tally val (cdr lst)))) (else (tally val (cdr lst))))))
We cross our fingers and run the tests again.
10 tests conducted. No tests failed. CONGRATULATIONS! All tests passed.
Now, we've left out some tests. What if
val is a symbol, character, or string? We'll add a few more
tests to tally-test.ss
to cover these cases. We don't need as many
tests for each, since we already know that the overall structure works.
(test! (tally 'a (list 'a 'a 'a)) 3) (test! (tally #\a (string->list "alphabet")) 2) (test! (tally "hello" (list "hello" "goodbye" "and" "hello")) 2) (test! (tally null (list null "null" 'null)) 1)
Let's see what happens.
14 tests conducted. One test failed. No errors encountered. One other test failed to give the expected result: For (tally "hello" (list "hello" "goodbye" "and" "hello")) expected [2] got [0] Sorry. You'll need to fix your code.
Hmmm. There's an error, but only for strings. Why? We'll leave that as an exercise for the reader.
Many of us are reminded of the need for unit testing by the following story by Doug McIlroy, posted to The Risks Digest: Forum on Risks to the Public in Computers and Related Systems.
Sometime around 1961, a customer of the Bell Labs computing center questioned a value returned by the sine routine. The cause was simple: a card had dropped out of the assembly language source. Bob Morris pinned down the exact date by checking the dutifully filed reversions tests for system builds. Each time test values of the sine routine (and the rest of the library) had been printed out. Essentially the acceptance criterion was that the printout was the right thickness; the important point was that the tests ran to conclusion, not that they gave right answers. The trouble persisted through several deployed generations of the system.
McIlroy, Doug (2006). Trig routine risk: An Oldie. Risks Digest 24(49), December 2006.
If, instead of a thick printout, Bell Labs had arranged for a count of successes and a list of failures, they (and their customers) would have have been in much better shape.
http://www.cs.grinnell.edu/~rebelsky/Courses/CS151/History/Readings/unit-testing.html
.
[Skip to Body]
Primary:
[Front Door]
[Syllabus]
[Glance]
[Search]
-
[Academic Honesty]
[Instructions]
Current:
[Outline]
[EBoard]
[Reading]
[Lab]
[Assignment]
Groupings:
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Projects]
[Readings]
Reference:
[Scheme Report (R5RS)]
[Scheme Reference]
[DrScheme Manual]
Related Courses:
[CSC151 2006F (Rebelsky)]
[CSC151.01 2007S (Davis)]
[CSCS151 2005S (Stone)]
Disclaimer:
I usually create these pages on the fly
, which means that I rarely
proofread them and they may contain bad grammar and incorrect details.
It also means that I tend to update them regularly (see the history for
more details). Feel free to contact me with any suggestions for changes.
This document was generated by
Siteweaver on Thu Sep 13 20:55:19 2007.
The source to the document was last modified on Sun Feb 18 15:23:22 2007.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS151/2007S/Readings/unit-testing.html
.
You may wish to
validate this document's HTML
;
;
http://creativecommons.org/licenses/by-nc/2.5/
or send a letter to Creative Commons, 543 Howard Street, 5th Floor,
San Francisco, California, 94105, USA.