Displaying data
Summary: In this lab, you will have the opportunity to explore some of
the visualizations available through DrRacket’s plot package along
with our data sets.
Preparation
a. Do the traditional lab preparation. That is,
- Start DrRacket.
- Check for update the
csc151package. - Require the
csc151package with(require csc151).
b. Also require the plot package with (require plot).
c. Load the list of cities arranged by zip codes.
(define zips (read-csv-file "/home/username/Desktop/us-zip-codes.csv"))
d. If you haven’t done so, save a copy of the Project Gutenberg version of Jane Eyre on your desktop.
e. Add the following undocumented procedures to your definitions pane.
(define zip-ends-with
(lambda (city three-char-suffix)
(string=? (substring (car city) 2) three-char-suffix)))
(define zip-starts-with
(lambda (city three-char-prefix)
(string=? (substring (car city) 0 3) three-char-prefix)))
f. Create four different small subsets of the zips data using filter
and zip-ends-with.
(define zips1
(filter (section zip-ends-with <> "021") zips))
(define zips2
(filter (section zip-ends-with <> "606") zips))
(define zips3
(filter (section zip-starts-with <> "021") zips))
(define zips4
(filter (section zip-starts-with <> "606") zips))
g. Add the following undocumented procedure to your definitions pane.
(define useful-entry?
(lambda (entry)
(and (real? (cadr entry))
(real? (caddr entry)))))
h. Explain to yourself why useful-entry? is likely to be useful.
Exercises
Exercise 1: Plotting cities
a. Using filter, write an expression that selects only the elements
of zips1 that contain a latitude and longitude.
> (define valid1 (filter ... zips1))
b. Using map1, extract only the latitude and longitude from that
list. (You may want to write a separate helper that extracts a latitude
and longitude from a single entry.)
> (define lat-long-1 (map1 ... valid1))
c. Using plot and points, display the points.
> (plot (points ...))
d. Repeat those steps with zips2.
Since latitude and longitude are angles, rather than x and y coordinates, this approach is imperfect. But it will suffice for our experiments.
Exercise 2: Plotting cities, revisited
a. Write an expression or series of expressions that plots the first
two sets of points, using one color for the valid entries in zips1
and another for the valid entries in zips2.
b. Do you expect to see something similar or different for the entries in
zips3 or zips4?
c. Check your answer experimentally. Then discuss with your partner any differences you see.
Exercise 3: Plotting cities, re-revisited
a. Write an expression or expressions to plot the cities in zips1 so that those north of 39.72 are one color and those south of 39.72 are another color. For example, those north of 39.72 might be blue and those south of 39.72 might be gray.
b. Write an expression or expressions to plot the cities in zips1 and
zips3 using four colors: one for zips1 north of 39.72, one for
zips1 south of 39.72, one for zips3 north of 39.72, and one for
zips3 south of 39.72.
Exercise 4: Detour: Exploring colors
Here’s a simple expression to plot some points.
> (plot (list (points (list (list 0 0) (list 10 10) (list 3 5) (list 1 4))
#:fill-color "red"
#:sym 'fullcircle6)
(points (list (list 5 10) (list 6 9) (list 8 7))
#:fill-color "black"
#:sym 'fullcircle6)
(points (list (list 1 1) (list 2 3) (list 3 5))
#:fill-color "blue"
#:sym 'fullcircle6)))
In addition to color names, DrRacket lets you use RGB triplets: Lists
of three integers, as in #:fill-color (list 200 10 180).
Experiment with a few triplets to find five or so colors you find useful as a set.
Exercise 5: Categorical data
In a recent lab, you wrote a procedure something like the following.
(define categorize
(lambda (city)
(cond
[(not (useful-entry? city))
"Unknown"]
[(> (cadr city) 39.72)
"North"]
[(< (cadr city) 39.72)
"South"]
[else
"Other"])))
a. Using tally-all, map1, and categorize, create summary
information for zips1. Here’s one possible output.
> (.... zips1)
'(("North" 27) ("Unknown" 1) ("South" 31))
b. Using plot and discrete-histogram, make a histogram of these
values.
c. Repeat your work for zips3.
d. Repeat your work for zips.
e. Given those results, how representative do you feel your sample data are?
Exercise 6: Tallying different types
a. Write a procedure, tally-alphabetic, that, given a list of characters,
determines how many are alphabetic.
> (tally-alphabetic (list #\a #\b #\3 #\d))
3
> (tally-alphabetic (string->list "a and b3 & q4"))
6
Hint: One approach is to filter the alphabetic characters and then find out how long the list is.
Hint: char-alphabetic? is a built-in Scheme procedure.
b. Write a procedure, tally-digits, that, given a list of characters,
determines how many are digits.
> (tally-digits (list #\a #\b #\3 #\d))
1
> (tally-digits (string->list "a and b3 & q4"))
2
Hint: char-numeric? is a built-in Scheme procedure.
c. Write a procedure, tally-whitespace, that, given a list of characters,
determines how many are whitespace.
> (tally-whitespace (string->list "a and b3 & q4"))
4
Hint: char-whitespace? is a built-in Scheme procedure.
d. Write a procedure, tally-other, that, given a list of characters,
determines how many are neither alphabetic, nor digits, nor whitespace.
> (tally-other (string->list "a and b3 & q4"))
1
e. Write a procedure, char-tallies, that, given a string, produces
a list of four numbers corresponding to the four numbers above.
> (char-tallies "a and b3 & q4")
'(6 2 4 1)
Exercise 7: Visualizing tallies
a. Write a procedure, explore-strings, that takes a list of strings as
input and produces a stacked histogram of the distribution of characters
in the strings using char-tallies.
(define explore-strings
(lambda (strings)
(plot (stacked-histogram (map1 (lambda (str) (cons "" ...))
strings)))))
b. Run explore-strings on a few sample inputs.
> (explore-strings
(list
"Now is the time for all good men to come to the aid of their country."
"A 1 and a 2 and a 3 and ...."
"'Twas brillig and the slithy toves; did gyre and gimble in the wabe."))
c. Run explore-strings on lines 100-110 of Jane Eyre.
Exercise 8: Exploring strings, revisited
Arrange for the histogram you created in the previous exercise to have an appropriate legend, title, and other labels.
For those with extra time
Extra 1: Other groupings
We’ve created zips1 and zips2 by selecting the entries whose last
three digits of zip code match.
Create two other lists, zips3 and zips4, in which you select the
entries whose first three digits match. Use “021” and “606” as the
leading digits.
a. What do you expect to happen when we plot the four sets of data?
b. Check your answer experimentally.
Extra 2: Side-by-side histograms
Skim [the DrRacket documentation on histograms].
Using the ideas contained therein, show the north-south histograms
of zipa1, zips2, zips3, and zips4 in one diagram that makes
it easier for the reader to understand how they relate.
Extra 3: Side-by-side histograms
a. What do you expect to happen if we add zips to the solution above?
b. Check your answer experimentally.
c. You should observe that the large list of zips so dominates that the others become almost invisible. How might you solve this problem?
d. Discuss your answer with a teacher or mentor.
e. Implement your solution (or the one your teacher or mentor suggests).