Mini-Project 2: Extending your string skills while checking digits (or vice versa)

Summary
In this mini project, we will explore check digits, a technique for verifying that we correctly enter large strings of numbers, e.g., for accounts numbers or credit cards. We will also explore build a library of procedures to support further use of strings.
Collaboration
Each student should submit their own responses to this assignment. You may consult other students in the class as you review the course materials and work on the assignment. If you receive help from anyone, make sure to cite them in your responses. If you refer to reference pages on the course Web site or elsewhere, you should cite them.

For this project, you will submit two files: string-utils.rkt, which contains your string utilities from part one, and isbn.rkt, which contains your ISBN checker for part two and your tests from part three.

Some background

As our programs get more complicated, the structure of our code and good names and formatting are not enough to make our code readable and correct. We need to rely on extrinsic means to ensure these things. To ensure that our code is readable, we use documentation to capture aspects of our code that are not obvious upon inspection. To ensure that our code is correct, we use tests that codify the correctness our programs through concrete examples.

As you may recall, we recommend a four-step process for developing procedures: Document the procedure, make a list of sample inputs and outputs that will help you determine whether your implementation is correct, write your procedure, and extend your list of sample inputs and outputs. In the Spring 2021 section of this course, we explored this process for the valid-id? procedure.

During our week on software engineering fundamentals, we’ll discuss these concepts in more detail. For now, we’ll employ some basic documentation and testing for our program.

Documentation

For each function that you write in this mini project, include a function comment that captures the types of the function as well as describes its output in a sentence or two. For example, here is a function comment for a function that finds the minimum of three numbers:

;;; (min x y z) -> real?
;;;   x : real?
;;;   y : real?
;;;   z : real?
;;; Returns the minimum of x, y, and z
(define min-of-three
  (lambda (x y z)
    (cond 
      [(and (<= x y) (<= x z))
       x]
      [(and (<= y x) (<= y z))
       y]
      [else
       z])))

The function comment is a stylized comment that consists of the following three components:

  • (min x y z) -> real?: the signature of the function which names its arguments and describes the output type of the function. In Racket, we express the types with the predicate functions that we use in code to test whether an expression has that type. For example, this signature says that min has three arguments, x, y, and z and that it produces a real number (as tested by the real? function).
  • x : real? ...: the types of each of the parameters mentioned in the signature. Like the return type of the function, we document the types of the parameters with the predicates that we would use in code to test values of those types.
  • Returns the minimum of x, y, and z: finally, we include a brief sentence or two description of the behavior and output of the function. Here, the behavior of the function is simple, so we comparatively have little to say: the function returns the minimum of its arguments.

Tests

Up until this point, we have asked you to experiment with the functions that you write in the interactions window to check for correctness. This has the upside of being fast, but if you change your code, you need manually type in all those tests again which is tedious (which in turn makes it less likely you’ll recheck the correctness of your code). A better solution is to codify your tests in your code so that you can rerun the tests at will.

During our unit on software engineering fundamentals, we’ll introduce you to a library, rackunit, that makes test authoring and execution a breeze. For now, we’ll look at two basic RackUnit procedures

When we are developing predicates (procedures that return a Boolean) value, Racket provides two important procedures to help us make that list of inputs/outputs and automatically check it for us: (test-true DESCRIPTION EXP) and (test-false DESCRIPTION EXP), which evaluate the expression and prints out the description only when the expression does not evaluate to true (false).

For example,

;;; (valid-id? str) -> boolean?
;;;   str : string?
;;; Determines whether str is a valid id in that it is a six-character
;;; string that contains only digits.
(define valid-id?
  (lambda (str)
    (and (string? str)
         (string->number str)
         (= 6 (string-length str)))))
> (test-true "standard id" (valid-id? "123456"))
> (test-true "leading 0's" (valid-id? "000011"))
> (test-false "fewer than six characters" (valid-id? "1234"))
> (test-false "more six characters" (valid-id? "1234567"))
> (test-false "non-digit at end" (valid-id? "12345x"))
> (test-false "begins with dash" (valid-id? "-12345"))
--------------------
begins with dash
. FAILURE
name:       check-false
location:   interactions from an unsaved editor:15:2
params:     '(#t)
--------------------

As you can see, when the test succeeds, we get no output. When the test fails, we get a large failure notice. We can take advantage of this behavior by putting the tests immediately after our code in the definitions pane. We also use test-true and test-false in the auto graders. To use these procedures, one needs to require the rackunit library with (require rackunit).

Part the first: String utilities

As you have likely discovered by now, the built-in Racket procedures don’t always imediately do what we want. For example, although we can use a combination of integer? and string->number to determine if a string contains only digits, we would prefer not to write (integer? (string->number str)) again and again and again, particularly since we might later realize that that solution is not perfect.

When most programmers discover that they need to do the same thing again and again? They create a library of utility procedures that they plan to use in other procedures. Although you are just beginning your experience as a Racket programmer, you will still find it useful to create your own set of utilities.

a. Suppose we are writing a procedure, (all-digits? str), that takes a string as an integer and determines if that string contains only digits. Write six or more tests (including at least two that use test-true and at least two that use test-false) that one could use to determine whether all-digits? is behaving correctly. Make sure to consider edge cases, such as when the string starts with a plus sign or includes a period.

b. Document and implement all-digits?.

c. Document, write tests for, and implement a procedure, (digit->integer char) that takes a character as input and produces the integer that corresponds to that input. If the input is not a digit character, your procedure should return -1.

> (digit->integer #\2)
2
> (digit->integer #\a)
-1

d. Document, write tests for, and implement a procedure, (trim str), that takes a string as input, checks whether there is a space at the beginning or a space at the end, and if, so, removes those spaces.

Note: You will likely only write test-true tests for trim.

e. Document, write tests for, and implement a procedure, (remove-spaces str), that takes a string as input and removes any spaces from the string. You may find string-replace or string-split useful.

Note: You will likely only write check-true tests for remove-spaces.

f. Document, write tests for, and implement two more procedures that work with characters or strings and that you expect to be useful.

Part the second: International Standard Book Numbers (ISBNs)

Many important pieces of data are represented as large strings of digits, for example, account numbers or credit card numbers. Furthermore, these data are entered in by hand with alarming frequency. What happens if we mistype one or two characters of our credit card number into an online vendor’s website? Will we mistakenly use someone else’s account?

It turns out that these account numbers are designed so that this is not the case. Most such strings of numbers include a check digit which is an extra number calculated from the other numbers in the string. In this manner, a system can check that an account number is entered correctly because if it isn’t, then the calculated check digit will differ from the check digit present in entered account number.

To aid in our checking, we will also write a variety of utility procedures.

In this part of the project, we’ll take a look at a particular check digit scheme for International Standard Book Numbers (ISBNs). ISBNs serve as unique identifiers for books. For example, the ISBN-10 for The Book of M: A Novel is:

  • 0062669613

(Note that ISBN-10 refers to the 10-digit version of these numbers. There also exists an 13-digit version of these numbers, ISBN-13, that was created to accommodate more books. In this assignment, we’ll focus exclusively on ISBN-10s.)

The first nine numbers of the ISBN contain various information, e.g., the location in which the title was registered and the publisher identity. The final digit is the check digit. In our example above, 3 is the check digit.

Calculating check digits

The check digit is not arbitrarily assigned. It is instead calculated from the other nine numbers of the ISBN number as follows:

  1. First, we take each of the nine non-check digits of the ISBN and, in left-to-right order, multiply the first number by 10, the second number by 9, and so forth. We then sum up the results. Note that there are nine such digits, so the left-most is multiplied by 10 and the right-most number is multiplied by 2. In our example, we would compute:

    10×0 + 9×0 + 8×6 + 7×2 + 6×6 + 5×6 + 4×9 + 3×6 + 2×1 = 184
    
  2. Next, we take the sum s and compute s modulo 11. Recall that modulo computes the remainder of the division of the two numbers. This will result in a number in the range 0 through 10. In our example:

    184 mod 11 = 8
    
  3. We then subtract this result from 11 and then mod by 11 to arrive at the check digit:

    (11 - 8) mod 11 = 3
    

Note that the expected check digit based on the remaining digits of the ISBN is 3 which is the check digit of the actual ISBN. This means that the ISBN is correctly formed and likely doesn’t contain a typo!

‘X’ check digit values

Note that check digits are a value in the range of 0 to 10. However 10 is not a single digit—it is made up of 2 digits (in base 10)! In this case, rather than using 10, we use X for the check digit value which is why you sometimes see ISBN-10s that end in X!

For example, let’s consider checking the ISBN 123456789X:

  1. First we compute the multiplied sum of the first 9 digits of the ISBN as described above:

    10×1 + 9×2 + 8×3 + 7×4 + 6×5 + 5×6 + 4×7 + 3×8 + 2×9 = 210.
    
  2. We compute the sum modulo 11: 210 mod 11 = 1

  3. We subtract this quantity from 11: 11 - 1 = 10

  4. Finally, we take this result modulo 11: 10 mod 11 = 10.

So the expected check digit is 10 but really X in the ISBN-10. Note that this is precisely what the ISBN 123456789X ends in!

Zero check digit values

Remember that extra mod from above? Let’s look at an example: “6134002100”

  1. First we compute the multiplied sum of the first 9 digits of the ISBN as described above:

    10x6 + 9x1 + 8x3 + 7x4 + 6x0 + 5x0 + 4x2 + 3x1 + 2x0 = 132
    
  2. We compute the sum modulo 11: 132 mod 11 = 0.

  3. We subtract this quantity from 11: 11 - 0 = 11.

  4. Finally, we take this result modulo 11: 11 mod 11 = 0.

Our check digit is zero.

The ISBN checker program

In this project, you will create a procedure valid-isbn?:

  • As input, valid-isbn? takes in a string that contains an ISBN-10 number to be verified.
  • As output, (valid-isbn? str) produces #t if str is a valid ISBN-10 number according to its check digit and #f otherwise.

If valid-isbn? is not given a string that is not formatted correctly, then the procedure also returns #f. An input string is formatted correctly if:

  • The string is 10 characters.
  • Each character is drawn from the digits 0–9 (#\0#\9) or the letter ‘X’ (#\X).

For example:

> (valid-isbn? "0062669613")
#t
> (valid-isbn? "0062569613")
#f
> (valid-isbn? "123456789X")
#t
> (valid-isbn? "123456780X")
#f
> (valid-isbn? "123456780W")
#f
> (valid-isbn? "12345678X3")
#f
> (valid-isbn? "hello world!")
#f

Note that the input to valid-isbn? is a string! It is not a single integer—note that because one of the characters of the ISBN could be ‘X’, the input cannot be an integer.

You should use your decomposition techniques to break up this procedure into relevant sub-problems so that the program is manageable to write. In particular, you ought to write a isbn? (or, better yet, a correct-isbn-format?) procedure that checks to see if its input is a correctly format (but not necessarily valid) ISBN according to the rules above.

There are a number of procedures that we have discussed in the course so far that you may find useful for this demonstration exercise. We list some of them here for your reference:

  • (modulo x y): computes x mod y, the whole number reminder left over after dividing x and y.
  • (string->list str): returns a list that is composed of the individual characters of str.
  • (reverse lst): returns lst, but reversed.
  • (range n): returns a list of the numbers 0n-1.
  • (map f l): returns the list l but with every element of l transformed by procedure f.
  • (reduce f l): returns the result of applying binary procedure f to every pair of neighboring elements in the list.
  • (list-ref lst n): retrieves nth element of lst. Note that lists are zero-indexed, so the index of the first element is 0, and the index of the last element is (- (length lst) 1).
  • (char->integer c) converts a character c into its sequence number. (NOTE: (char->integer #\0) does not evaluate to 0!) See below for how to use this procedure to convert characters that are digits into integers you can do computation over.)

Part the third: Testing

With test-true and test-false in hand, build at least six test cases at the top level:

(test-true MSG (valid-isbn? ...))
(test-true MSG (valid-isbn? ...))
(test-true MSG (valid-isbn? ...))
(test-false MSG (valid-isbn? ...))
(test-false MSG (valid-isbn? ...))
(test-false MSG (valid-isbn? ...))

Make sure to choose a variety of ISBNs that cover all possible cases of your procedures execution. In particular, make sure to include negative tests ones where the ISBN is invalid as well as corner cases such as when the reported check digit is an X. Feel free to use your knowledge of the formula to manufacture artificial examples that exercise particular parts of your code as well as real-world examples from books that you have lying around. Be aware that some books have incorrect ISBNs, so don’t be surprised if you find one or two books that produce invalid ISBNs according to your program!

Submitting your work

Put your work from part one in a file called string-utils.rkt.

Put your work from parts two and three in a file called isbn.rkt.

Make sure to put the tests after the procedures you are testing.

Make sure to organize the file so that it’s easy for another human (e.g., your professor, mentors, or graders) to read it.

Turn in your completed files to Gradescope. Try to resolve any issues that the autograder describes.

Partial rubric

In grading these assignment, we will look for the following for each level. We may also identify other characteristics that move your work between levels.

This rubric may be updated closer to the due date.

Redo or above

Submissions that lack any of these characteristics will get an I.

[ ] Includes the two specified files.
[ ] Includes an appropriate header on each file that indicates the 
    course, author, etc.
[ ] Code runs in DrRacket.
[ ] If the program references other files, those files were submitted.
[ ] If the program references other files, it does so with the base file 
    name, rather than a complex path.
[ ] Documentation for most procedures.

Meets expectations or above

Submissions that lack any of these characteristics will get an R.

[ ] All procedures in part one pass our autograder tests.
[ ] All procedures in part one include tests.
[ ] The `valid-isbn?` procedure works correctly for most positive cases 
    (where it returns `#t`)
[ ] The `valid-isbn?` procedure works correctly for most negative cases
    (where it returns `#f`)
[ ] The `valid-isbn?` procedure works correctly for ISBNs whose check 
    digit is X.
[ ] The `valid-isbn?` procedure works correctly for ISBNs whose check 
    digit is 0.
[ ] Code is well-formatted with appropriate names and indentation.
[ ] Code for part one passes at least 90% of the tests.
[ ] Code for part two passes at least 90% of the tests.
[ ] All procedures documented, generally correctly.
[ ] Code has been reformatted with Ctrl-I before submitting.
[ ] Documentation for most procedures is correct / has the correct form.

Exemplary / Exceeds expectations

Submissions that lack any of these characteristics will get an M.

[ ] The extra procedures in part one are particularly creative or useful.
[ ] `isbn.rkt` contains no fewer than three procedures in addition to 
    `valid-isbn?` (*i.e.*, at least four procedures in total) that 
    decompose the ISBN problem in appropriate ways.
[ ] Code from part one passes 100% of the tests.
[ ] Code from part two passes 100% of the tests.
[ ] No "magic numbers", such as 48.
[ ] Avoids repeated work, such as recalculation of digits.
[ ] Tests include well-designed "edge cases". 
[ ] Documentation for all procedures is correct / has the correct form.

Bonus

[ ] Tests that identify issues that the autograder missed.

Citations

The Spring 2021 Term 1 version of this assignment (distributed by Samuel A. Rebelsky) is based on a Fall 2020 Term 2 version of the assignment. This version adds the string utilities section, adds the partial rubric, revises the testing section, and removes the luhn-16-check? problem. Peter-Michael Osera wrote the original version of the ISBN problem.