---
title: Eboard 17  Processing XML
number: 17
section: eboards
held: 2019-03-04
link: true
---
CSC 151 2019S, Class 17:  Processing XML
========================================

_Overview_

* Preliminaries
    * Notes and news
    * Upcoming work
    * Extra credit
    * Questions
* XML, revisited
* Representing XML in Racket
* Expressing patterns in XML
* Constructing new documents from old
* Lab

Preliminaries
-------------

### News / Etc.

* Mentor sessions Wednesday 8-9 p.m., Thursday 8-9 p.m., Sunday 5-6 p.m.
* Welcome to any prospective students we have.  Thank you for bringing
  warmer weather with you.
* I'm back!  I hope that you had a good time without me.  I apologize
  for the inconsistency in communication.
* I brought you conference swag.  (One of each item per person.)

### Upcoming work

* Reading for Wednesday
    * [Forthcoming]
* [Assignment 6](../assignments/assignment06) due Tuesday.
* [Flash cards](../flashcards/flashcards07) due Wednesday at 8:00 p.m.
    * Covers Wednesday/Friday/Monday classes
* [Lab writeup](../labs/writeup15) due before class Wednesday
    * Exercises: TBD
    * Subject: CSC 151.01 Writeup for Class 17 (YOUR NAMES) 
   * To: csc151-01-grader@grinnell.edu
* Quiz Friday: Hash tables, structs, and searching XML

### Extra Credit

_I would certainly appreciate suggestions of other extra credit activities
(preferably via email)._

#### Extra credit (Academic/Artistic)

#### Extra credit (Peer)

* Grinnell Singers, Sunday at 2pm. with Lyra Baroque Orchestra (professional
  musicians, period instruments), really difficult pieces by Handel and
  others.
* Twelfth Night this weekend

#### Extra credit (Wellness)

#### Extra credit (Wellness, Regular)

* 30 Minutes of Mindfulness at SHACS every Monday 4:15-4:45
* Any organized exercise.  (See previous eboards for a list.)
* 60 minutes of some solitary self-care activities that are unrelated to
  academics or work.  Your email reflection must explain how the activity
  contributed to your wellness.
* 60 minutes of some shared self-care activity with friends. Your email
  reflection must explain how the activity contributed to your wellness.

#### Extra credit (Misc)

### Other good things 

### Questions

_How should we get a hash table into a list form?_

```drracket
> hash
'#hash(("Xinya" . 10) ("Sam" . 4) ("Prospie" . 2141) ("Sarina" . 3))
> (define make-entry
    (lambda (key)
      (list key (hash-ref hash key))))
> (make-entry "Xinya")
'("Xinya" 10)
> (map make-entry (hash-keys hash))
'(("Sarina" 3) ("Prospie" 2141) ("Sam" 4) ("Xinya" 10))
```

_How do I get that in the right order?_

```drracket
> (sort (hash-keys hash) string<=?)
'("Prospie" "Sam" "Sarina" "Xinya")
> (map make-entry (sort (hash-keys hash) string<=?))
'(("Prospie" 2141) ("Sam" 4) ("Sarina" 3) ("Xinya" 10))
```

_What is string->lines?_

> I have no idea what you are talking about.  Gremlins replaced
  `file->lines` with that name.  And I fixed it.  Remember the
  aphorism, "Computers are sentient and malicious."  "And so are
  CS faculty."

_Is there a nice way to convert the structs to something more readable?_

> Maybe

```drracket
(struct person (lname fname) #:transparent)

(define person->string
  (lambda (p)
    (string-append (person-fname p) " " (person-lname p))))
```

_Something something something proc string.  Did that make sense?_

> Um.

_Could you tell me more about `for-each`?_

> `(map proc lst)` applies proc to each element of lst, returning
  a new list.

> Sometimes, we have procedures whose only purpose is to have a
  side-effect, rather than to return something.

> If you use those with `map`, you get a list of voids.

> For each gives us the side effect without building a useless list.

> Demo

```drracket
> (for-each (lambda (thing) (hash-set! hash (list-ref thing 0) (list-ref thing 1)))
       answerers)
> hash
'#hash(("Mira" . 12) ("Nick" . 1112123) ("Everett" . 55) ("Christa" . 2))
```

_Wow, you didn't understand me at all.  Is there a way to turn the name
of a procedure, as a string, into the procedure itself?_

```
> (foo "square")
#<procedure:square>
```

_Can I automatically make procedures, just like `struct` does?_

> Not with what you've learned.

> Yes, if you're willing to go beyond the scope of the course.

_Is there a way to turn a list of lists into a single list?_

> We know that we can turn two lists into one list, with append.

```drracket
> (append (range 1 5) (range 2 6))
'(1 2 3 4 2 3 4 5)
```

> If we have a procedure that combines pairs of elements into a single
  element, we can use `reduce` with a list.

```drracket
> (define lol '((a) (b c) (d e f) (a)))
> (reduce append lol)
'(a b c d e f a)
```

_Can we use `car` and `cdr` and `caddar`?_

> Yes.

_Can we use `set!`._

> No!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

XML, revisited
--------------

XML is a notation for taking text and marking the type of sections of 
things, such as the style or such.  Closely related to HTML.

```xml
<html>
<head>
</head>
<body>
<h1>Meanings of Um</h1>
<p>Um has many meanings, which is why students use it so much.</p>
<p>Here are some of those meanings.</p>
<ul>
<li>Um can mean <q>use <em>memory</em></q>.</li>`
<li>Um can mean <q><em>You</em> may be confused.</q></li>
<Li>Um can mean <q><em>You</em> missed the closing tag.</q></li>
</ul>
</body>
</html>
```

We may want to count or manipulate our HTML/XML documents.

* How many paragraphs are there?
* How many times does emphasized text appear?
* Replace all instances of "Sam" with "$^%%^(&(&*()*"
* ...

Some can be done with regular expressions, some are hard to do with
regular expressions (and may even be impossible).

* Suppose we want to count the number of times emphasized text appears
  (nested or not)?
* Suppose we want to count the number of times emphasized text appears
  within emphasized text.  `<em>Computers are <em>really</em> annoying.</em>`

How would we do each?

If we did not care about nesting, we'd write

```
> (length (regexp-match* "<em[ >]" str))
```

For example,

```
> (regexp-match* #px"<em>" text)
'("<em>" "<em>" "<em>" "<em>" "<em>")
> (define str "<em class='loud'>Hello</em> there <em>someone</em>")
> (regexp-match* #px"<em>" str)
'("<em>")
> (regexp-match* #px"<em[ >]" str)
'("<em " "<em>")
> (regexp-match* #px"<em[ >]" "<emphatic>")
'()
> (regexp-match* #px"<em." "<emphatic>")
'("<emp")
; DL says something clever, Sam can't copy and paste.
```

For the nesting, it's hard because we don't have a way to express "not this
longer string"

We need an extension to regular expressions that works better for hierarchical
documents.

W3C defines a standard, XPath, for "regular expressions over XML documents"

Representing XML in Racket
--------------------------

Expressing patterns in XML
--------------------------

Constructing new documents from old
-----------------------------------

Lab
---
