Skip to main content

CSC 151 2019S, Class 06: Pattern matching with regular expressions

Overview

  • Preliminaries
    • Notes and news
    • Upcoming work
    • Extra credit
    • Questions
  • Lab!
  • Time to meet with HW partners.
  • Debrief (if time)

Preliminaries

News / Etc.

  • Don’t forget that we have mentor sessions tonight 8-9pm and Thursday 8-9 in the CS commons.
  • Mentors should not be asked for help outside of class time and mentor sessions. Feel free to use evening tutors and to send me email.
  • As we’ll discuss on Friday, we’d really like you to work together on problems. (If you’d like to think about problems in advance, that’s okay, but there’s good benefit to working together.)
  • Even after I talked about pulling the keyboard away from your partner, some of you did that.
  • The current flash cards page now contains one set of questions I’ve received. (If all goes well, it will soon contain all of them.) (At some point, I’ll create a better UI.)
  • I’ll try to remember to set the “switch person at keyboard” alarm.
  • Do any statisticians want to compute the odds that people will get the same partner a second time? (Assume we have 34 students and this is the fifth time we’ve had random partners.) EC for first two or three answers.

Upcoming work

  • Readings due before class Friday
    • Pair programming (ready)
    • How Scheme works (not ready)
  • Assignment 4 due Tuesday night.
    • Partners to be assigned via email.
  • Flash cards due TONIGHT at 8:00 p.m.
  • No lap writeups.
  • Quiz Friday
    • Basic types
    • Writing procedures

Extra Credit

Extra credit (Academic/Artistic)

  • John Garrison reads from Shakespeare and the Afterlife, Thursday, 7 February 2019 in the Faculconer gallery.
  • Once Upon a Time Wolf (tickets required), Bucksbaum. Friday, 8 February, 7pm.
  • Once Upon a Time Wolf (tickets required), Bucksbaum. Saturday, 9 February, 7pm
  • Any Data Week activity next week.
  • HackGC weekend of 15-17 February 2019. (I’m still looking for links.)
  • Friday night Gardner concert, 8:30 p.m.

Extra credit (Peer)

  • Home track meet, Saturday, 9 Feb 2019, all-day and beyond. (30 min suffices)
  • Conference Swim and Dive meet, 15-17 February 2019. Dive times to be announced later.

Extra credit (Wellness)

  • HIIT training, 4:30 pm, Tuesday, Dance Studio, Bear. (Cap of two EC units.)
  • Hatha Yoga, 7:00 pm, Tuesday, Dance Studo, Bear. (Cap of two EC units.)
  • Brazilian Jiu-Itsu, Wednesday and Friday, 6:30, Dance Studio (cap of two EC units.)
  • Any Sex Week activity next week.

Extra credit (Misc)

Other good things

Questions

How do we get the loudhum libraries on our personal computers?

Start DrRacket

File > Install Package

Enter “https://github.com/grinnell-cs/loudhum.git” (without the quotation marks).

Click Install* or **Update (whichever appears).

When the Close button becomes available, click it.

Could you explain #px"a[a-z]*a" and how it differs from #px"a[a-z]+a"?

Sure. (or at least I can try)

Three parts: a, [a-z]*, a

The a’s match the letter a (and nothing else). So we are looking for strings that start with a and end with a.

We’ll pull apart the [a-z]*. [a-z] is shorthand for a or b or c or d or e or … or z. “Lowercase letters”.

An expression followed by a star means “zero or more copies” [a-z]* means “0 or more lowercase letters”.

#px"a[a-z]*a" means “sequences of characters that start with a, end with a, and have only lowercase letters in between.”

    > (regexp-match* #px"a[a-z]*a" "alphabet aardvark aardwolf samr")
    '("alpha" "aardva" "aa")

An expression followed by a plus means “one or more copies” [a-z]+ means “1 or more lowercase letters”.

#px"a[a-z]+a" means “sequences of characters that start with a, end with a, and have at least one lowercase letter in between.”

Why did we get “aardva” from “aardvark”, rather than “aa”?

In general, Racket looks for the longest string that matches the pattern.

Do we need the #px?

Often, but not always. It’s safer to include it.

Could you explain the "\\1\\2" replacement?

Once again, I can try. Apologies for limited creativity.

We can parenthesize parts of an expression. Sometimes for clarity. Sometimes to deal with precedence issues. #px"ab+" means “a followed by at least one b.” #px"(ab)+" means “sequences of repeating abababab”

    > (regexp-match* #px"ab+" "abbbba ababab")
    '("abbbb" "ab" "ab" "ab")
    > (regexp-match* #px"(ab)+" "abbbba ababab")
    '("ab" "ababab")

We may want to refer to things from the pattern when we do a replacement. For example, I may want to replace “X and Y” with “Y and X”.

The pattern is #px"(\\S+) and (\\S+)

    > (regexp-match* #px"(\\S+) and (\\S+)" "pb and j, rock and roll, foo and bar")
    '("pb and j," "rock and roll," "foo and bar")

The replacement is “\2 and \1”

    > (regexp-replace* #px"(\\S+) and (\\S+)" 
                       "pb and j, rock and roll, foo and bar" 
                       "\\2 and \\1")
    "j, and pb roll, and rock bar and foo"

Did you really want the comma?

No. Sometimes I write bad regular expressions.

What if I want aa or ee or ii or oo or uu

#px"(aa|ee|ii|oo|uu)"

Lab

How do I get the first twenty characters?

(take book-letters 20)

How do I figure out how many letters?

(length book-letters) or (string-length book-contents)

How do get rid of letters or words or lines from a list?

(drop lst num)

Meet with HW partners