'(tag (@ (name1 val1) (name2 val2) ...) element1 element2 ...)
-
A list-based representation of an XML/HTML element. The attribute
section is optional. The elements are either strings or themselves
XML/HTML elements.
"//tag"
- an XPath pattern to search for elements with the given tag.
"//tag0/tag1"
- an XPath pattern to search for elements with tag tag1
that appear directly under elements with tag tag0
.
"//tag0//tag1"
- an XPath pattern to search for elements with tag tag1
that appear anywhere under elements with tag tag0
.
"//tag[1]"
- the first instance of the tag within an enclosing
element. (We have similar "//tag[2]"
and so on and so forth.)
"//tag[@class='name']"
- all tags with the given class.
"//text()"
- all of the text in a document.
"//tag[contains(text(),'string')]"
- all instances of the tag that
contain that string in their text.
“
(file->xml fname)
- Read an HTML document and convert it to the
list-based representaiton.
(xml->file html fname)
- Save the list-based representation of an HTML
document in a file.
(string->xml str)
- Convert a string to the list-based representation.
(xml->string xml)
- Convert the list-based representation to a string.
(sxpath-match pattern xml)
- Search the html document for matching
patterns.
(sxpath-replace pattern xml proc)
- Update any element matching
the pattern by applying proc
.
(sxpath-delete pattern xml)
- Delete any element matching the pattern.
(sxpath-remove pattern xml)
- Remove the tag in any element matching
the pattern, moving any contents of the element up to the enclosing
element.
a. Start DrRacket.
b. Make sure that you have the latest version of the loudhum
package
by opening a terminal window and typing /home/rebelsky/bin/csc151/update
.
(Alternately, select File > Install Package…, enter
“https://github.com/grinnell-cs/loudhum.git
” and follow the instructions.)
c. Install the sxml
package as follows: Select File > Install
Package…. Enter “sxml
”. Click Install. When a Close
button appears, click it.
d. Add (require loudhum)
and (require sxml)
to the definitions pane.
e. If you did not set up a Web site in MathLAN at the start of the
semester, set one up now by opening a terminal window and typing
/home/rebelsky/bin/csc151/setup-web
.
f. Verify that you can load one of the sample pages by directing your browser to https://www.cs.grinnell.edu/~username/excerpt.html, substituting your own user name.
As you may recall, excerpt.html
contains a short excerpt from
Through the Looking Glass.
a. Write an expression that identifies all of the quotations in that document.
b. Write an expression that identifies all of the quotations by the White Queen.
c. Write an expression that identifies all of the spoken quotations.
a. Write an expression that replaces every one of the White Queen’s quotations with the text “Off with their heads!”.
b. Write an expression that removes every one of the White Queen’s quotations.
a. Write an expression that strongly emphasizes every spoken quotation .
That is, put a strong
tag around the quotation.
b. Write an expression that turns every spoken quotation into all caps.
That is, identify the text within the quotation and call string-upcase
on that text.
Write an expression that removes the q
tag from any of Alice’s quotations.
Write an expression that inserts the text "PAY ATTENTION:"
at the
start of every quotation.
Reminder: You can use append
to join lists. In this case, you’ll
want to join a list of the tag (and, possibly, the attributes), a
list of the string, and the rest of the contents of the element.
Note that this process is complicated by the possible inclusion of
attributes in the quotation. Fortunately, there’s a
(has-attributes? element)
procedure that checks whether or not there’s
a set of attributes.
We do not anticipate that anyone will have extra time.
This lab was newly written in spring 2019.
The loudhum
libraries to support these exercises on on the Racket
SXML libraries, and on Neil
Van Dyke’s html-parsing
and html-writing
libraries.