#lang racket

(provide (all-defined-out))
(require csc151)
(require csc151/rex)

;; CSC-151-01 (Fall 2021)
;; Lab: NAME
;; Authors: YOUR NAMES HERE
;; Date: THE DATE HERE
;; Acknowledgements:
;;   ACKNOWLEDGEMENTS HERE

#|
In this lab, you and your partner will practice manipulating files
and strings using file and regular-expression (rex) procedures.

The person with the problem description should drive and their
partner should navigate.  Again, make sure to be good partners and
focus completely on solving the current problem on the driver's screen
rather than working ahead or on their own.

Also: Don't forget our new "start of session".  Chat with your partner
about working habits and strengths.  Maybe share something interesting
about yourself
|#

; +---------------+--------------------------------------------------
; | Provided code |
; +---------------+

(define phishy
  "fishy: one cat, one hat, two things, \none fish, two fish, red fish, blue fish, green and yellow fish \nred books \n\n\none and two\tor\tthree and four\nthat is flat\n")

#| AB |#

; +--------------------------------+---------------------------------
; | Exercise 1: Working with texts |
; +--------------------------------+

#|
Project Gutenberg <https://www.gutenberg.org/> provides an extensive
collection of public domain books in a variety of forms, including
"plain text".

a. Navigate to the Project Gutenberg Web site and download one or two books 
in plain text format. Strive for short- to medium-length books.
_Jane Eyre_ <https://www.gutenberg.org/ebooks/1260> is okay. _The Complete 
Works of William Shakespeare_ <https://www.gutenberg.org/ebooks/100> is not.
Note: You can right click on the link to the text file and select
"Save Link As" to save the file.

<TODO: Enter the title or titles of the books and the corresponding file names.>

b. Pick one of the books you've downloaded and open it in a text editor
to view the content.  If you are working in MathLAN, gedit is the preferred
text  editor, but you can also open it in DrRacket.  (If you do the latter,
don't try to run it.)  Take notes of anything you observe as you glance
through the file.  Spend no more than three minutes looking at the file.

<TODO: Enter notes of anything you observe.>
|#

#| A |#

; +-------------------------------------------+----------------------
; | Exercise 2: Working with texts, revisited |
; +-------------------------------------------+

#|
a. Using a file you downloaded in the prior exercise, write
instructions in the definitions pane to read the characters, words,
lines, and complete contents from the book. (The contents should
be a single string.)  Call the results book-characters, book-words,
book-lines, and book-contents. For example,

(define book-characters (file->chars "pg1260.txt"))
|#

(define book-characters "<FILL IN>")

(define book-words "<FILL IN>")

(define book-lines "<FILL IN>")

(define book-contents "<FILL IN>")

#|
b. Write instructions to extract the first 20 characters, 10 words,
and 5 lines from the book.  Hint: Use list operations you know, such
as `take` or `drop`.
|#

(define first-20-chars "<FILL IN>")

(define first-ten-words "<FILL IN>")

(define first-five-lines "<FILL IN>")

#|
c. Determine how many characters (in the Scheme sense, not in the
"Alice", "Dr. Strangelove", or "Cowardly Lion" sense).
|#

(define total-characters "<FILL IN>")

#|
d. Determine how many letters (letters, not characters) appear 
in the book.
|#

(define total-letters "<FILL IN>")

#|
d. Write instructions to extract lines 100 (inclusive) through 120
(exclusive) from the book.
|#

(define lines-100-to-120 "<FILL IN>")

#|
e. Write instructions to determine how many times the letter "a" appears 
in the book. (You need deal only with lowercase "a".)
|#

(define count-of-as "<FILL IN>")

#| B |#

; +----------------------------+-------------------------------------
; | Exercise 3: Creating files |
; +----------------------------+

#|
As you may recall, the procedure (string->file str fname), saves a
string to the named file. There's also a (lines->file lines fname),
that saves a list of strings to the named file, one string per line.
|#

#|
a. Write a procedure to save line 100 of your book to the file line100.txt. 
|#

;;; (save-line-100 infile) -> void?
;;;   infile : string?
;;; Save line 100 of the given file to line100.txt
(define save-line-100
  (lambda (infile)
    {??}))

#|
b. Verify that you were successful by using file->string with that
same file name.
|#

(define line100 "<FILL IN>")

#|
c. Save lines 100 (inclusive) through 120 (exclusive) of your book to 
the file excerpt.txt.
|#

;;; (save-lines-100-to-120 fname) -> void?
;;;   fname : string?
;;; Saves lines 100 (inclusive) to 120 (exclusive) of
;;; the given file to the file excerpt.txt
(define save-lines-100-to-120
  (lambda (infile)
    {??}))

#|
d. Verify that you were successful by using file->string with that same 
file name.
|#

(define excerpt "<FILL IN>")

#| A |#

; +---------------------------+--------------------------------------
; | Exercise 4: Miscellaneous |
; +---------------------------+

#|
If you return to the top of this file, you will see that we defined
a variable named `phishy`.
|#

#|
a. Suppose we create a file with 

    (string->file phishy "phishy.txt") 

What do you expect the contents of that file to look like?

<TODO: Enter an answer here.>
|#

#|
b. Check your answer experimentally.

<TODO: Cut and paste the results of your experiment here.>
|#

#|
c. One way to break up that string is at each space. Write an expression 
to do so. (You should not need regular expressions, at least not yet;
`string-split` should suffice.)
|#

(define split-at-space "<TODO>")

#|
d. Another way to break up that string is at each newline character.
Write an expression to do so. (You still should not need regular
expressions, at least not yet.)
|#

(define split-at-newline "<TODO>")

#|
e. The word "and" appears a few times in that string. Split it at that word.
|#

(define split-at-and "<TODO>")

#| B |#

; +--------------------------------------------------------+---------
; | Exercise 5: Splitting strings with regular expressions |
; +--------------------------------------------------------+

#|
As you may have noted in the previous exercise, it seems insufficient
to split at a space, or a newline, or even a tab (which we didn't
try yet).
|#

#|
a. Write an expression that uses regular expressions to split phishy
at any whitespace character (space, tab, or newline).  You should
use `rex-split-string` and a rex of your choice.
|#

(define split-at-whitespace "<TODO>")

#|
b. As you may noted (perhaps should have noted), the list created
by `split-at-whitespace` contains many empty strings.  That's because
we're splitting at a single whitespace character but the file
contains sequences of whitespace characters, such as a space and a
newline, or multiple newlines in a row. Write an expression that
splits sample at any nonempty sequence of whitespace characters.
|#

(define split-at-whitespace-sequences "<TODO>")

#|
c. As you may have noted, the previous example includes characters
in "words" that are not alphabetical, such as the colon in "fishy:"
and the comma in "hat,". Write an expression that splits sample at
any nonempty sequence of non-alphabetical characters.
|#

; +------------------------------+-----------------------------------
; | Exercise 6: Extracting words |
; +------------------------------+

#|
Write a procedure, (string->words str), that takes a string as
input and splits it into the "words" (sequences of alphabetical
characters).  You should use `rex-find-matches` and an appropriate
rex pattern

> (string->words phishy)
'("fishy" "one" "cat" "one" "hat" "two" "things" "one" "fish" "two" "fish" "red" "fish" "blue" "fish" "green" "and" "yellow" "fish" "red" "books" "one" "and" "two" "or" "three" "and" "four" "that" "is" "flat")
> (string->words "hello+goodbye, ph33r")
'("hello" "goodbye" "ph" "r")
|#

;;; (string->words str) -> listof? string?
;;;   str : string?
;;; Make a list of all the words (sequences of letters)
;;; that appear in str.
(define string->words
  (lambda (str)
    {??}))