Overview
I would certainly appreciate suggestions of other extra credit activities (preferably via email).
Can we do overkill on the date time stuff, say paying attention to Julian/Gregorian switch?
You need not pay attention to the switch. But if it floats your boat, as they say, it’s fine.
Can we talk about 1a?
Goal on 1a is to build a hash table that tallies letters, extract info in a systematic way.
Here’s an incomplete solution.
(define tally-letters
(lambda (str)
(let ([uh (make-hash)])
; Initialize
(for-each (section hash-set! uh <> 0)
(string-split "abcdefghijklmnopqrstuvwxyz" ""))
; Fill in values
(for-each
(lambda (thing)
(when (regexp-match? #px"[a-z]" thing)
(hash-set! uh thing (+ 1 (hash-ref uh thing)))))
(string-split str ""))
uh)))
See exercise 7 on the hash tables lab for more ideas.
<p> for opening paragraphs and </p> for closing paragraphs.<uh> marks universal headings?<em> … </em> marks emphasized text.<q> … </q> marks quotation.<strong> … </strong> marks strongly emphasized text.<ul> … </ul> - list (unnumbered list)<ol> … </ol> - list (ordered list)<li> … </li> - list items<p class="article"> gives additional information about
an element of the document.We can write programs that transform and analyze text. (Not a surprise at this time.)
We should also be able to write programs that transform and analyze Web pages.
Write a regular expression that matches emphasized pieces of text so that we can count the number of emphasized pieces.
> (regexp-match* #px"<em>" example)
'("<em>" "<em>" "<em>" "<em>" "<em>")
> (length (regexp-match* #px"<em>" example))
5
> (regexp-match* #px"<em" "<em class='booktitle'>Alice in CSC151land</em>")
'("<em")
> (regexp-match* #px"<em[ >]" "<em class='booktitle'>Alice in CSC151land</em>")
'("<em ")
> (regexp-match* #px"</em>" "<em class='booktitle'>Alice in CSC151land</em>")
'("</em>")
> (length (regexp-match* #px"</em>" example))
5
_Write a regular expression that matches nested emphasized pieces
of text so that we can, say, replace the inner one with a
tag. <em>Sam thinks this is <em>very</em> important!</em>_
Because XML/HTML are hierarchical and regular expressions are linear, regular expressions do poorly this kind of problem.
There is no common pattern language for hierarchical structures.
However, there is a standard for XML documents, called XPath. We will consider a subset of XPath.
XML is hierarchical, strings are not. How might we represent a hierarchical document in Racket?
Options: Lists, Strings, Hash tables
We can think about each HTML element as a list. For example
<p>Hello world</p> might be represented as '(p "Hello world").
<p>This is <em>more</em> complicated</p> as
(p "This is " (em "more") " complicated")
file->html and string->html convert to this format
html->file and html->string convert from this format
"//em" - Search for this tag