CS362 2011F Compilers


Summary: In today's lab, you will explore the standard Unix lexical analyzer generator, lex, and its GNU implementation, flex.

Collaboration: Feel free to work on this lab in pairs or trios.

Turning It In: Don't bother turning this lab in. Do, however, ask me questions when you find that you are confused.

Background: lex (or flex, in its GNU implementation) is one of the standard lexical analyzer generators. You can use lex to build lexers which interact with parsers. You can also use lex to build programs that react in other ways to regular expressions they encounter. We will focus on the second use today, and write lex programs that print output that depends on the input they receive.

You can get more information on flex from the man page of the info page. Please don't print these out; you should get over sixty pages of output.

Structure of a Lex File:

pattern1	c-code-to-execute-upon-matching-pattern
pattern2	c-code-to-execute-upon-matching-pattern
patternn	c-code-to-execute-upon-matching-pattern

Sample Lex File:

" "		{ }
\n              { printf ("Howdy!\n"); }
begin           { printf ("Keyword-Begin\n"); }
[a-z][a-z0-9_]* { printf ("Look, an identifier: '%s'\n", yytext); }
int yywrap () { return 1; }

The yywrap function gets called by the lexer whenever it hits the end of file. If yywrap returns a nonzero value, the lexer stops.

Using Lex Files:

Suppose we saved this as example.l (yes, .l is a normal suffix for lex files; .yy is another one). To build an executable from it, you would do the following:

$ flex example.l
	# Generates file lex.yy.c
$ gcc -lfl lex.yy.c
	# Generates file a.out
$ a.out < file

Alternately, you could use make to convert example.l into example.c

$ make example.c
flex  -t example.l > example.c


1. Figure out what the example file does with things not described in the regular expressions. (You should be able to identify some natural things not described in the regular expressions.)

2.a. Determine whether flex follows the longest token rule (e.g., whether, in the example above, given input of begine it matches the keyword begin or the identifier begine).

2.b. If flex does follow the longest token rule, determine what flex does if it has to backtrack. For example, given keywords of begin and beginning (and no identifiers), what does it do if it sees beginn. (You'll need to write your own flex proram to test this issue.)

3. Determine if f?lex generates case-sensitive or case-insensitive lexers. Then, find out how to make it generate the other kind of lexer. (I.e., if it generates case-sensitive lexers, find out how to make it generate case-insensitive lexers.)

4. Write flex programs to separate input in each of the following ways. (I'd prefer that you print out things with their type.)

The pattern/action rules in your program will generally look like

pattern	{ printf("DESCRIPTIVE-WORD[%s]\n", yytext); }

For example, the output of the first program might be something like the following:


5. Look at one of the copies of lex.yy.c and see if you can figure out where and how the finite automaton is coded. You might want to try variants of the lex program to see what difference it makes.

6. While f?lex can be used for a variety of purposes, in this course, we will use it to build our tokenizer. To tokenize, we'll want the C code for each pattern to return a value that corresponds to the token.

Sketch how you might rewrite your keywords + identifiers + other tokenizer so that it returns the next token each time it is called.

You may find the following main routine helpful.

main (int argc, char *argv[])
  int type;
  // yylex return 0 upon encountering EOF.
  while (type = yylex ())
    print_token (type);
  return 0;
} // main



Monday, 16 September 2002 [Samuel A. Rebelsky]

  • Created, based on Andrew Kensler's notes from the last time I gave this lab. (That version of the lab was not written up.)

Tuesday, 17 September 2002 [Samuel A. Rebelsky]

  • Added a few more notes.

Wednesday, 11 February 2004 [Samuel A. Rebelsky]

  • Added problem 2 (maximal matching).
  • Updated instructions for problem 3. Added the subproblem with five parts.
  • Updated the background slightly.

Thursday, 12 February 2004 [Samuel A. Rebelsky]

Monday, 19 September 2011 [Samuel A. Rebelsky]


Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Sat Nov 12 22:53:44 2011.
The source to the document was last modified on Mon Sep 19 09:31:55 2011.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2011F/Labs/lex.html.

Samuel A. Rebelsky, rebelsky@grinnell.edu