---
title: Eboard 06  Pattern matching with regular expressions
number: 6
section: eboards
held: 2019-02-06
link: true
---
CSC 151 2019S, Class 06:  Pattern matching with regular expressions
===================================================================

_Overview_

* Preliminaries
    * Notes and news
    * Upcoming work
    * Extra credit
    * Questions
* Lab!
* Time to meet with HW partners.
* Debrief (if time)

Preliminaries
-------------

### News / Etc.

* Don't forget that we have mentor sessions tonight 8-9pm and Thursday
  8-9 in the CS commons.
* Mentors should not be asked for help outside of class time and
  mentor sessions.  Feel free to use evening tutors and to send me email.
* As we'll discuss on Friday, we'd really like you to work
  together on problems.  (If you'd like to think about problems in 
  advance, that's okay, but there's good benefit to working together.)
* Even after I talked about pulling the keyboard away from your
  partner, some of you did that.
* The [current flash cards page](../flashcards/flashcards03) now contains
  one set of questions I've received.  (If all goes well, it will soon
  contain all of them.)  (At some point, I'll create a better UI.)
* I'll try to remember to set the "switch person at keyboard" alarm.
* Do any statisticians want to compute the odds that people will get
  the same partner a second time?  (Assume we have 34 students and
  this is the fifth time we've had random partners.)  EC for first two
  or three answers.

### Upcoming work

* Readings due before class Friday
    * Pair programming (ready)
    * How Scheme works (not ready)
* [Assignment 4](../assignments/assignment04) due Tuesday night.
    * Partners to be assigned via email.
* [Flash cards](../flashcards/flashcards03) due TONIGHT at 8:00 p.m.
* No lap writeups.
* Quiz Friday
    * Basic types
    * Writing procedures

### Extra Credit

#### Extra credit (Academic/Artistic)

* John Garrison reads from _Shakespeare and the Afterlife_, Thursday,
  7 February 2019 in the Faculconer gallery.
* _Once Upon a Time Wolf_ (tickets required), Bucksbaum.
  Friday, 8 February, 7pm.
* _Once Upon a Time Wolf_ (tickets required), Bucksbaum.
  Saturday, 9 February, 7pm
* Any Data Week activity next week.
* HackGC weekend of 15-17 February 2019.  (I'm still looking for links.)
* Friday night Gardner concert, 8:30 p.m.

#### Extra credit (Peer)

* Home track meet, Saturday, 9 Feb 2019, all-day and beyond.  (30 min suffices)
* Conference Swim and Dive meet, 15-17 February 2019.  Dive times to
  be announced later.

#### Extra credit (Wellness)

* HIIT training, 4:30 pm, Tuesday, Dance Studio, Bear.  (Cap of two EC units.)
* Hatha Yoga, 7:00 pm, Tuesday, Dance Studo, Bear.  (Cap of two EC units.)
* Brazilian Jiu-Itsu, Wednesday and Friday, 6:30, Dance Studio (cap of two
  EC units.)
* Any Sex Week activity next week.

#### Extra credit (Misc)

### Other good things

### Questions

How do we get the loudhum libraries on our personal computers?

> Start DrRacket

> **File** > **Install Package** 

> Enter "https://github.com/grinnell-cs/loudhum.git" (without the quotation
  marks).

> Click **Install* or **Update** (whichever appears).

> When the **Close** button becomes available, click it.

Could you explain `#px"a[a-z]*a"` and how it differs from `#px"a[a-z]+a"`?

> Sure.  (or at least I can try)

> Three parts: `a`, `[a-z]*`, `a`

> The `a`'s match the letter a (and nothing else).  So we are looking
  for strings that start with a and end with a.

> We'll pull apart the `[a-z]*`.  `[a-z]` is shorthand for a or b or c
  or d or e or ... or z.  "Lowercase letters".

> An expression followed by a star means "zero or more copies"
  `[a-z]*` means "0 or more lowercase letters".

> `#px"a[a-z]*a"` means "sequences of characters that start with a,
  end with a, and have only lowercase letters in between."

        > (regexp-match* #px"a[a-z]*a" "alphabet aardvark aardwolf samr")
        '("alpha" "aardva" "aa")

> An expression followed by a plus means "one or more copies"
  `[a-z]+` means "1 or more lowercase letters".

> `#px"a[a-z]+a"` means "sequences of characters that start with a,
  end with a, and have at least one lowercase letter in between."

Why did we get "aardva" from "aardvark", rather than "aa"?

> In general, Racket looks for the *longest* string that matches the pattern.

Do we need the `#px`?

> Often, but not always.  It's safer to include it.

Could you explain the `"\\1\\2"` replacement?

> Once again, I can try.  Apologies for limited creativity.

> We can parenthesize parts of an expression.  Sometimes for clarity.
  Sometimes to deal with precedence issues.  `#px"ab+"` means "a
  followed by at least one b."  `#px"(ab)+"` means "sequences of
  repeating abababab"

        > (regexp-match* #px"ab+" "abbbba ababab")
        '("abbbb" "ab" "ab" "ab")
        > (regexp-match* #px"(ab)+" "abbbba ababab")
        '("ab" "ababab")

> We may want to refer to things from the pattern when we do a replacement.
  For example, I may want to replace "X and Y" with "Y and X".

> The pattern is `#px"(\\S+) and (\\S+)`

        > (regexp-match* #px"(\\S+) and (\\S+)" "pb and j, rock and roll, foo and bar")
        '("pb and j," "rock and roll," "foo and bar")

> The replacement is "\\2 and \\1"

        > (regexp-replace* #px"(\\S+) and (\\S+)" 
                           "pb and j, rock and roll, foo and bar" 
                           "\\2 and \\1")
        "j, and pb roll, and rock bar and foo"

Did you really want the comma?

> No.  Sometimes I write bad regular expressions.

What if I want aa or ee or ii or oo or uu

> `#px"(aa|ee|ii|oo|uu)"`

Lab
---

How do I get the first twenty characters?

> `(take book-letters 20)`

How do I figure out how many letters?

> `(length book-letters)` or `(string-length book-contents)`

How do get rid of letters or words or lines from a list?

> `(drop lst num)`

Meet with HW partners
---------------------
