Compilers (CS362 2004S)

Regular Expressions

Summary: In today's lab, you will explore regular expressions in the Unix environment.

Collaboration: Feel free to work on this lab in pairs or trios.

Turning It In: Email me your answers.

Grading: I expect that you will gain more from doing this lab than from me grading this lab. When I have a chance, I will simply scan through your answers to see if you had any particularly valuable insights.

Background: In this lab, you'll use regular expressions to look for English words that match particular patterns. You can find a list of English words in /usr/share/dict/words.

The standard command for searching using regular expressions is grep (and no, it is not "GNU rep"). grep stands for something like general regular expression patterns. You traditionally use grep in a form like the following:

% grep 'regexp' file

It is important to put the regular expression in single quotes so that the shell does not interpret special characters, like braces, parens, and stars.

Here are the basic mechanisms for building a grep-style regular expression. You should be able to find more in the man page.

What can you do with the output from grep? You can put the results in a new file.

% grep 'regexp' file  > results-file

You can look at the results by piping them through less or more.

% grep 'regexp' file  | less

You can simply count the results by piping them through wc, the word-count program.

% grep 'regexp' file  | wc -l

You can even send the results through another invocation of grep (which is a nice way to get values accepted by both regular expressions; that is, the interesection of two langauges).

% grep 'regexp1' file | grep 'regexp2' | ...

Questions: Write a regular expression for each of the following and determine how many words in the Unix dictionary match the regular expression.

  1. Words starting with a
  2. Words starting with A
  3. Words that start with a or A
  4. Words with exactly four letters
  5. Words with exactly four letters and begins with a
  6. Words that contain a capital letter
  7. Words that start with non-capital and include a capital
  8. Words with more than one capital
  9. Words that neither begin nor end with a
  10. Words that begin and end with a vowel
  11. Words that neither begin nor end with a vowel
  12. Words that match your phone number, using the normal number-letter conversion: 2 is abc, 3 is def, 4 is ghi, and so on and so forth (if your phone number includes a 0 or a 1, try the digits on either side of the 0 or 1)
  13. Words that contain the vowels in order (and, perhaps, intervening letters)
  14. Words that contain your initials in order (and, perhaps, intervening letters)
  15. Words that begin with sc or xy
  16. Words with four or more letters
  17. Words that contain only the letters of your first name

 

History

Thursday, 1 February 2001 [Samuel A. Rebelsky]

Monday, 9 September 2002 [Samuel A. Rebelsky]

Tuesday, 10 September 2002 [Samuel A. Rebelsky]

Wednesday, 4 February 2004 [Samuel A. Rebelsky]

 

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Wed May 5 11:46:48 2004.
The source to the document was last modified on Wed Feb 4 20:14:44 2004.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2004S/Labs/lab.03.html.

You may wish to validate this document's HTML ; Valid CSS! ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu