Skip to main content

CSC 151 2019S, Class 17: Processing XML

Overview

  • Preliminaries
    • Notes and news
    • Upcoming work
    • Extra credit
    • Questions
  • XML, revisited
  • Representing XML in Racket
  • Expressing patterns in XML
  • Constructing new documents from old
  • Lab

Preliminaries

News / Etc.

  • Mentor sessions Wednesday 8-9 p.m., Thursday 8-9 p.m., Sunday 5-6 p.m.
  • Welcome to any prospective students we have. Thank you for bringing warmer weather with you.
  • I’m back! I hope that you had a good time without me. I apologize for the inconsistency in communication.
  • I brought you conference swag. (One of each item per person.)

Upcoming work

  • Reading for Wednesday
    • [Forthcoming]
  • Assignment 6 due Tuesday.
  • Flash cards due Wednesday at 8:00 p.m.
    • Covers Wednesday/Friday/Monday classes
  • Lab writeup due before class Wednesday
    • Exercises: TBD
    • Subject: CSC 151.01 Writeup for Class 17 (YOUR NAMES)
    • To: csc151-01-grader@grinnell.edu
  • Quiz Friday: Hash tables, structs, and searching XML

Extra Credit

I would certainly appreciate suggestions of other extra credit activities (preferably via email).

Extra credit (Academic/Artistic)

Extra credit (Peer)

  • Grinnell Singers, Sunday at 2pm. with Lyra Baroque Orchestra (professional musicians, period instruments), really difficult pieces by Handel and others.
  • Twelfth Night this weekend

Extra credit (Wellness)

Extra credit (Wellness, Regular)

  • 30 Minutes of Mindfulness at SHACS every Monday 4:15-4:45
  • Any organized exercise. (See previous eboards for a list.)
  • 60 minutes of some solitary self-care activities that are unrelated to academics or work. Your email reflection must explain how the activity contributed to your wellness.
  • 60 minutes of some shared self-care activity with friends. Your email reflection must explain how the activity contributed to your wellness.

Extra credit (Misc)

Other good things

Questions

How should we get a hash table into a list form?

> hash
'#hash(("Xinya" . 10) ("Sam" . 4) ("Prospie" . 2141) ("Sarina" . 3))
> (define make-entry
    (lambda (key)
      (list key (hash-ref hash key))))
> (make-entry "Xinya")
'("Xinya" 10)
> (map make-entry (hash-keys hash))
'(("Sarina" 3) ("Prospie" 2141) ("Sam" 4) ("Xinya" 10))

How do I get that in the right order?

> (sort (hash-keys hash) string<=?)
'("Prospie" "Sam" "Sarina" "Xinya")
> (map make-entry (sort (hash-keys hash) string<=?))
'(("Prospie" 2141) ("Sam" 4) ("Sarina" 3) ("Xinya" 10))

What is string->lines?

I have no idea what you are talking about. Gremlins replaced file->lines with that name. And I fixed it. Remember the aphorism, “Computers are sentient and malicious.” “And so are CS faculty.”

Is there a nice way to convert the structs to something more readable?

Maybe

(struct person (lname fname) #:transparent)

(define person->string
  (lambda (p)
    (string-append (person-fname p) " " (person-lname p))))

Something something something proc string. Did that make sense?

Um.

Could you tell me more about for-each?

(map proc lst) applies proc to each element of lst, returning a new list.

Sometimes, we have procedures whose only purpose is to have a side-effect, rather than to return something.

If you use those with map, you get a list of voids.

For each gives us the side effect without building a useless list.

Demo

> (for-each (lambda (thing) (hash-set! hash (list-ref thing 0) (list-ref thing 1)))
       answerers)
> hash
'#hash(("Mira" . 12) ("Nick" . 1112123) ("Everett" . 55) ("Christa" . 2))

Wow, you didn’t understand me at all. Is there a way to turn the name of a procedure, as a string, into the procedure itself?

> (foo "square")
#<procedure:square>

Can I automatically make procedures, just like struct does?

Not with what you’ve learned.

Yes, if you’re willing to go beyond the scope of the course.

Is there a way to turn a list of lists into a single list?

We know that we can turn two lists into one list, with append.

> (append (range 1 5) (range 2 6))
'(1 2 3 4 2 3 4 5)

If we have a procedure that combines pairs of elements into a single element, we can use reduce with a list.

> (define lol '((a) (b c) (d e f) (a)))
> (reduce append lol)
'(a b c d e f a)

Can we use car and cdr and caddar?

Yes.

Can we use set!.

No!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

XML, revisited

XML is a notation for taking text and marking the type of sections of things, such as the style or such. Closely related to HTML.

<html>
<head>
</head>
<body>
<h1>Meanings of Um</h1>
<p>Um has many meanings, which is why students use it so much.</p>
<p>Here are some of those meanings.</p>
<ul>
<li>Um can mean <q>use <em>memory</em></q>.</li>`
<li>Um can mean <q><em>You</em> may be confused.</q></li>
<Li>Um can mean <q><em>You</em> missed the closing tag.</q></li>
</ul>
</body>
</html>

We may want to count or manipulate our HTML/XML documents.

  • How many paragraphs are there?
  • How many times does emphasized text appear?
  • Replace all instances of “Sam” with “$^%%^(&(&()

Some can be done with regular expressions, some are hard to do with regular expressions (and may even be impossible).

  • Suppose we want to count the number of times emphasized text appears (nested or not)?
  • Suppose we want to count the number of times emphasized text appears within emphasized text. <em>Computers are <em>really</em> annoying.</em>

How would we do each?

If we did not care about nesting, we’d write

> (length (regexp-match* "<em[ >]" str))

For example,

> (regexp-match* #px"<em>" text)
'("<em>" "<em>" "<em>" "<em>" "<em>")
> (define str "<em class='loud'>Hello</em> there <em>someone</em>")
> (regexp-match* #px"<em>" str)
'("<em>")
> (regexp-match* #px"<em[ >]" str)
'("<em " "<em>")
> (regexp-match* #px"<em[ >]" "<emphatic>")
'()
> (regexp-match* #px"<em." "<emphatic>")
'("<emp")
; DL says something clever, Sam can't copy and paste.

For the nesting, it’s hard because we don’t have a way to express “not this longer string”

We need an extension to regular expressions that works better for hierarchical documents.

W3C defines a standard, XPath, for “regular expressions over XML documents”

Representing XML in Racket

Expressing patterns in XML

Constructing new documents from old

Lab