Mini-Project 2: Check digits

Assigned: Thursday, 5 November 2020
Summary: In this demonstration exercise, we will explore check digits, a technique for verifying that we correctly enter large strings of numbers, e.g., for accounts numbers or credit cards.
Collaboration: Each student should submit their own responses to this assignment. You may consult other students in the class as you review the course materials and work on the assignment. If you receive help from anyone, make sure to cite them in your responses. If you refer to reference pages on the course Web site or elsewhere, you should cite them.

Many important pieces of data are represented as large strings of digits, for example, account numbers or credit card numbers. Furthermore, these data are entered in by hand with alarming frequency. What happens if we mistype one or two characters of our credit card number into an online vendor’s website? Will we mistakenly use someone else’s account?

It turns out that these account numbers are designed so that this is not the case. Most such strings of numbers include a check digit which is an extra number calculated from the other numbers in the string. In this manner, a system can check that an account number is entered correctly because if it isn’t, then the calculated check digit will differ from the check digit present in entered account number.

International Standard Book Numbers (ISBNs)

In this project, we’ll take a look at a particular check digit scheme for International Standard Book Numbers (ISBNs). ISBNs serve as unique identifiers for books. For example, the ISBN-10 for The Book of M: A Novel is:

0062669613

(Note that ISBN-10 refers to the 10-digit version of these numbers. There also exists an 13-digit version of these numbers, ISBN-13, that was created to accommodate more books. In this assignment, we’ll focus exclusively on ISBN-10s.)

The first nine numbers of the ISBN contain various information, e.g., the location in which the title was registered and the publisher identity. The final digit is the check digit. In our example above, 3 is the check digit.

Calculating check digits

The check digit is not arbitrarily assigned. It is instead calculated from the other 9 numbers of the ISBN number as follows:

First, we take each of the 9 non-check digits of the ISBN and, in left-to-right order, multiply the first number by 10, the second number by 9, and so forth. We then sum up the results. Note that there are 9 such digits, so the left-most is multiplied by 10 and the right-most number is multiplied by 2. In our example, we would compute:
```
10×0 + 9×0 + 8×6 + 7×2 + 6×6 + 5×6 + 4×9 + 3×6 + 2×1 = 184
```
Next, we take the sum s and compute s modulo 11. Recall that modulo computes the remainder of the division of the two numbers. This will result in a number in the range 0 through 11. In our example:
```
184 mod 11 = 8
```
We then subtract this result from 11 to arrive at the check digit:
```
11 - 8 = 3
```

Note that the expected check digit based on the remaining digits of the ISBN is 3 which is the check digit of the actual ISBN. This means that the ISBN is correctly formed and likely doesn’t contain a typo!

‘X’ check digit values

Note that check digits are a value in the range of 0 to 10. However 10 is not a single digit—it is made up of 2 digits (in base 10)! In this case, rather than using 10, we use X for the check digit value which is why you sometimes see ISBN-10s that end in X!

For example, let’s consider checking the ISBN 123456789X:

First we compute the multiplied sum of the first 9 digits of the ISBN as described above:
```
10×1 + 9×2 + 8×3 + 7×4 + 6×5 + 5×6 + 4×7 + 3×8 + 2×9 = 210.
```
We compute the sum modulo 11: 210 mod 11 = 1
We finally subtract this quantity from 11: 11 - 1 = 10

So the expected check digit is 10 but really X in the ISBN-10. Note that this is precisely what the ISBN 123456789X ends in!

The ISBN checker program

In this project, you will create a procedure valid-isbn?:

As input, valid-isbn? takes in a string that contains an ISBN-10 number to be verified.
As output, (valid-isbn? str) produces #t if str is a valid ISBN-10 number according to its check digit and #f otherwise.

If valid-isbn? is not given a string that is not formatted correctly, then the procedure also returns #f. An input string is formatted correctly if:

The string is 10 characters.
Each character is drawn from the digits 0–9 (#\0 … #\9) or the letter ‘X’ (#\X).

For example:

> (valid-isbn? "0062669613")
#t
> (valid-isbn? "0062569613")
#f
> (valid-isbn? "123456789X")
#t
> (valid-isbn? "123456780X")
#f
> (valid-isbn? "hello world!")
#f

Note that the input to valid-isbn? is a string! It is not a single integer—note that because one of the characters of the ISBN could be ‘X’, the input cannot be an integer.

You should use your decomposition techniques to break up this procedure into relevant sub-problems so that the program is manageable to write. In particular, you ought to write a isbn? (or, better yet, a correct-isbn-format?) procedure that checks to see if its input is a correctly format (but not necessarily valid) ISBN according to the rules above.

There are a number of procedures that we have discussed in the course so far that you may find useful for this demonstration exercise. We list some of them here for your reference:

(modulo x y): computes x mod y, the whole number reminder left over after dividing x and y.
(string->list str): returns a list that is composed of the individual characters of str.
(reverse lst): returns lst, but reversed.
(range n): returns a list of the numbers 0 … n-1.
(map f l): returns the list l but with every element of l transformed by procedure f.
(foldl f init l): returns the result of applying binary procedure f to every element of l in left-to-right order, starting with initial value init.
(list-ref lst n): retrieves nth element of lst. Note that lists are zero-indexed, so the index of the first element is 0, and the index of the last element is (- (length lst) 1).
(char->integer c) converts a character c into its sequence number. (NOTE: (char->integer #\0) does not evaluate to 0!) See below for how to use this procedure to convert characters that are digits into integers you can do computation over.)

Converting characters

You will likely find that you need to convert an individual character that is a digit into a number that you can use in calculations. However, simply using char->integer doesn’t get you what you might expect:

> (char->integer #\0)
48

Why 48? It turns out that 48 is the sequence number of the character #\0. This is certainly not what we want!

How can we convert a digit-character into a number? An important property of these sequence numbers is that #\0 through #\9 are assigned sequential sequence numbers:

> (map char->integer (list #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9))
'(48 49 50 51 52 53 54 55 56 57)

With this fact in mind, you can derive a simple computation involving the digit-character and a few calls to char->integer to convert that digit-character into a number. (Hint: you can think of the number as the difference or distance of that number from 0 with sequence numbers.)

The Luhn algorithm

As you might guess, the computation of the ISBN check digit is just one of many check-digit (or check-digits) computations used in practice. One of the most popular is the Luhn algorithm, which you can read about on Wikipedia at https://en.wikipedia.org/wiki/Luhn_algorithm (local copy). Unfortunately, the folks at Wikipedia have neglected to include a Racket version.

Write a procedure, (luhn-16-check? str) that takes a string representing a sixteen-digit credit card number and returns true (#t) if the number has a correct check digit and false (#f) if the number does not have a correct check digit.

Note that your procedure should be user friendly. If the input string contains spaces, you should elminate those spaces. That is, the result for “1234 5678 9012 3456” (how a human might write it) should be the same as the result for “1234567890123456”.

One of the difficulties of the luhn-check for students at your stage is that we need to do different things with alternating digits. You will likely find the following procedure helpful. It turns a string into a list of digit pairs. You will learn how to write procedures like this later in the term.

; (charpairs str) -> list-of string?
;   str : string?
; Turn str into a list of character pairs, with the first pair being the first two
; characters, the next pair being the next two characters, and so on and so forth.
;
; If str has an odd number of characters, the last character does not appear in the 
; list.
(define char-pairs
  (lambda (str)
    (regexp-match* #px".." str)))

> (char-pairs "1234567890123456")
'("12" "34" "56" "78" "90" "12" "34" "56")
> (char-pairs "123456789012345")
'("12" "34" "56" "78" "90" "12" "34")

You may not use techniques we have not learned in this class! In particular, you may not use recursion, loops (other than map), the evil of set!, or anything similar. Instead, you should think about how to decompose this problem into smaller problems, use char-pairs to turn the string into a list of pairs, map some procedure over that list, and work from there.

I will be happy to answer questions about the design of this algorithm in our Q&A channel. I would prefer that you think about the problem before looking at the public Q&A, but you’ll find that as people ask questions, there will be a wealth of hints avaialble.

Here’s an approximate series of steps that you may find useful:

Strip out the spaces.
Optional: Ensure that the result has sixteen characters.
Convert characters at indices 1 through 14 to two-character pairs.
For each pair (e.g., using map)
- Convert the two-character pair into two one-digit numbers
- Double the second number
- Extract the digits of the second number
- Add ‘em up, along with the digit of the first number
Sum it all (e.g., using apply)
Add the initial digit, doubled and then digits added.
Multiply by nine.
Take the remainder after dividing by ten.
Compare.

Note; Here’s an older version of those instructions, which gives a variant of the Luhn formula in which we count from the left, rather than the right. (We call this second one the “Sam version” as opposed to the “correct version” above.)

Strip out the spaces.
Optional: Ensure that the result has sixteen characters.
Convert the first fourteen characters to two-character pairs.
For each pair (e.g., using map)
- Convert the two-character pair into two one-digit numbers
- Double the second number
- Extract the digits of the second number
- Add ‘em up, along with the digit of the first number
Sum it all (e.g., using apply)
Add the fifteenth digit.
Multiply by nine.
Mod by ten.
Compare.

As the steps suggest, there are some natural ways to decompose the problem. For example, the steps done to each digit-pair might be a separate procedure.

Documentation and testing

As our programs get more complicated, the structure of our code and good names and formatting are not enough to make our code readable and correct. We need to rely on extrinsic means to ensure these things. To ensure that our code is readable, we use documentation to capture aspects of our code that are not obvious upon inspection. To ensure that our code is correct, we use tests that codify the correctness our programs through concrete examples.

During our week on software engineering fundamentals, we’ll discuss these concepts in more detail. For now, we’ll employ some basic documentation and testing for our program.

Documentation

For each procedure that you write in this exercise, include a procedure comment that captures the types of the procedure as well as describes its output in a sentence or two. For example, here is a procedure comment for a procedure that finds the minimum of three numbers:

;;; (min-of-three x y z) -> number?
;;;   x : number?
;;;   y : number?
;;;   z : number?
;;; Returns the minimum of x, y, and z
(define min-of-three
  (lambda (x y z)
    (cond 
      [(and (<= x y) (<= x z)) 
       x]
      [(and (<= y x) (<= y z)) 
       y]
      [else 
       z])))

The procedure comment is a stylized comment that consists of the following three components:

(min x y z) -> number?: the signature of the procedure which names its arguments and describes the output type of the procedure. In Racket, we express the types with the predicate procedures that we use in code to test whether an expression has that type. For example, this signature says that min has three arguments, x, y, and z and that it produces a number (as tested by the number? procedure).
x : number? ...: the types of each of the parameters mentioned in the signature. Like the return type of the procedure, we document the types of the parameters with the predicates that we would use in code to test values of those types.
Returns the minimum of x, y, and z: finally, we include a brief sentence or two description of the behavior and output of the procedure. Here, the behavior of the procedure is simple, so we comparatively have little to say: the procedure returns the minimum of its arguments.

Tests

Up until this point, we have asked you to experiment with the procedures that you write in the interactions window to check for correctness. This has the upside of being fast, but if you change your code, you need manually type in all those tests again which is tedious (which in turn makes it less likely you’ll recheck the correctness of your code). A better solution is to codify your tests in your code so that you can rerun the tests at will.

During our unit on software engineering fundamentals, we’ll introduce you to a library, rackunit, that makes test authoring and execution a breeze. For now, we’ll write our own small test harness that lets us write and execute tests in an efficient manner.

Write a procedure called (test-equal? actual expected) that checks whether actual is equal to expected (using the equal? procedure). (NOTE: test-equal? does not count towards the three required procedures in your decomposition of the problem!) In the case where the two values are equal, the procedure returns #t. If the two values are not equal, rather than returning #f (which would lead to a pretty silly procedure), we are going to raise an error instead, letting future us know that our test failed. To raise an error, use the conveniently named (error msg) procedure which stops execution of your program and writes the string msg to the interactions pane. For example:
```
> (error "Error: I raised an error!")
..  Error: I raised an error!
> (+ (* 3 5) (error "I raised an error before finishing the sum."))
.. I raised an error before finishing the sum.
```
Once the error procedure is evaluated, execution of the program halts with the error message!

With test-equal?, you will be able to create test cases, examples of how your procedures ought to be behave. For example, here are some test cases using test-equal? with the min-of-three procedure from before:
```
> (test-equal? (min-of-three 3 1 5) 1)
#t
> (test-equal? (min-of-three 1 9 -3) 1)
Error: test failed!
```
The second test case failed because (min-of-three 1 9 -3) (correctly) produced -3 when the expected value in the case was a 1. Note that the error message is not descriptive of why the test failed or where it occurred. If you want, feel free to change the error message of test-equal? to generate a more helpful message. However, note that this extra information is one of the reasons why we might prefer to use rackunit than roll our own solution!
With test-equal? in hand, define five test cases at the top level:
```
(define valid-isbn?-test-1 ...)
(define valid-isbn?-test-2 ...)
(define valid-isbn?-test-3 ...)
(define valid-isbn?-test-4 ...)
(define valid-isbn?-test-5 ...)
```
That tests valid-isbn? using test-equal?. Your chosen tests should not include the examples given in this write-up!

Make sure to choose a variety of ISBNs that cover all possible cases of your procedures execution. In particular, make sure to include negative tests ones where the ISBN is invalid as well as corner cases such as when the reported check digit is an X. Feel free to use your knowledge of the formula to manufacture artificial examples that exercise particular parts of your code as well as real-world examples from books that you have lying around. Be aware that some books have incorrect ISBNs, so don’t be surprised if you find one or two books that produce invalid ISBNs according to your program!

Submitting your work

Put all of your work (procedures, tests, notes, etc.) into a file called digit-checker.rkt. Make sure to organize the file so that it’s easy for another human (e.g., your professor, mentors, or graders) to read it. Turn in your completed digit-checker.rkt file to Gradescope.

Grading criteria

External correctness

(valid-isbn? str) correctly works for both positive cases (where it returns #t) and negative cases (where it returns #f)
(luhn-check? str) correctly works for both positive cases (where it returns #t) and negative cases (where it returns #f)

Internal correctness

checker.rkt contains no fewer than three procedures in addition to valid-isbn? (i.e., at least 4 procedures in total) that decompose the ISBN problem in appropriate ways. (NOTE: test-equal? does not count as one of these three procedures!)
checker.rkt also decomposes luhn-check? appropriately.
Each of your procedures is documented with a procedure comment as described above.
The five tests for valid-isbn? are defined as described above and cover all relevant cases of valid-isbn?’s execution.
You have created appropriate tests for the luhn-check? procedure. (You should strive for similar breadth to that you had for valid-isbn?.
Your code is well-formatted with appropriate names and indentation.

Citations

The Fall 2020 Term 2 version of this assignment (distributed by Samuel A. Rebelsky) is based closely on an earlier assignment distributed by Peter-Michael Osera. Rebelsky added the luhn-16-check? problem and rearranged or modified some other portions.

Copyright © Charlie Curtsinger, Sarah Dahlby Albright, Janet Davis, Fahmida Hamid, Titus Klinge, Peter-Michael Osera, Samuel A. Rebelsky, and Jerod Weinman. Selected materials are copyright by John David Stone or Henry Walker and are used with permission.

Unless specified otherwise elsewhere on this page, this work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

This website was built using Jekyll, Twitter Bootstrap, and the Bootswatch Cosmo Theme.