Compilers (CS362 2004S)

Using Lexical-Analyzer Generators

Summary: In today's lab, you will explore the standard Unix lexical analyzer generator, lex.

Collaboration: Feel free to work on this lab in pairs or trios.

Turning It In: Don't bother turning this lab in. Do, however, ask me questions when you find that you are confused.

Background: lex (or flex, in its GNU implementation) is one of the standard lexical analyzer generators. You can use lex to build lexers which interact with parsers. You can also use lex to build programs that react in other ways to regular expressions they encounter. We will focus on the second use today, and write lex programs that print output that depends on the input they receive.

You can get more information on flex from the man page. Please don't print the man page; it's over sixty pages long.

Structure of a Lex File:

c-code-prefixed-by-tabs-and-named-regular-expressions
%%
pattern1	c-code-to-execute-upon-matching-pattern
pattern2	c-code-to-execute-upon-matching-pattern
...
patternn	c-code-to-execute-upon-matching-pattern
%%
more-c-code

Sample Lex File:

%%
" "		{ }
\n              { printf("Howdy!\n"); }
begin           { printf("Keyword-Begin\n"); }
[a-z][a-z0-9_]* { printf("Look, an identifier: '%s'\n",yytext); }
%%
int yywrap() { return 1; }

The yywrap function gets called by the lexer whenever it hits the end of file. If yywrap returns a nonzero value, the lexer stops.

Using Lex Files:

Suppose we saved this as example.yy (yes, .yy is the normal suffix for lex files). To build an executable from it, you would do the following:

% flex example.yy
	# Generates file lex.yy.c
% gcc -lfl lex.yy.c
	# Generates file a.out
% a.out < file

Problems:

1. Figure out what the example file does with things not described in the regular expressions.

2.a. Determine whether flex follows the longest token rule (e.g., whether, in the example above, given input of begine it matches the keyword begin or the identifier begine).

2.b. If flex does follow the longest token rule, determine what flex does if it has to backtrack. For example, given keywords of begin and beginning (and no identifiers), what does it do if it sees beginn. (You'll need to write your own flex proram to test this issue.)

3. Write separate flex programs to separate input in each of the following ways. (I'd prefer that you print out things with their type.) The pattern/action rules in your program will generally look like

pattern	{ printf("DESCRIPTIVE-WORD[%s]\n", yytext); }

For example, the output of the first program might be something like the following:

WHITESPACE[     ]
NORMAL[11February2004]
WHITESPACE[
]
NORMAL[Dear]
WHITESPACE[ ]
NORMAL[Fred,]

4. Look at one of the copies of lex.yy.c and see if you can figure out where and how the finite automaton is coded. You might want to try variants of the lex program to see what difference it makes.

 

History

Monday, 16 September 2002 [Samuel A. Rebelsky]

Tuesday, 17 September 2002 [Samuel A. Rebelsky]

Wednesday, 11 February 2004 [Samuel A. Rebelsky]

Thursday, 12 February 2004 [Samuel A. Rebelsky]

 

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Wed May 5 11:46:48 2004.
The source to the document was last modified on Thu Feb 12 09:44:10 2004.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2004S/Labs/lab.04.html.

You may wish to validate this document's HTML ; Valid CSS! ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu