[Skip to Body]
Primary:
[Front Door]
[Schedule]
[Piazza]
-
[Academic Honesty]
[Instructions]
Current:
[Current Outline]
[Current EBoard]
-
[Current Assignment]
[Current Lab]
Groupings:
[Assignments]
[Documents]
[EBoards]
[Examples]
[Exams]
[Handouts]
[Labs]
[Outlines]
[Readings]
Related Courses:
[CSC362 2004S (Rebelsky)]
[CSC362 2010S (Stone)]
Misc:
[SamR]
[GNU Coding Standards]
[Dragon Book]
[Pascal Standards]
Summary:
In today's lab, you will explore the standard Unix lexical analyzer generator,
lex
, and its GNU implementation, flex
.
Collaboration: Feel free to work on this lab in pairs or trios.
Turning It In: Don't bother turning this lab in. Do, however, ask me questions when you find that you are confused.
Background:
lex
(or flex
, in its GNU implementation) is one
of the standard lexical analyzer generators. You can use lex
to build lexers which interact with parsers. You can also use
lex
to build programs that react in other ways to regular
expressions they encounter. We will focus on the second use today,
and write lex programs that print output that depends on the input
they receive.
You can get more information on flex
from the man page of
the info page. Please don't print these out; you should get over sixty
pages of output.
Structure of a Lex File:
c-code-prefixed-by-tabs-and-named-regular-expressions %% pattern1 c-code-to-execute-upon-matching-pattern pattern2 c-code-to-execute-upon-matching-pattern ... patternn c-code-to-execute-upon-matching-pattern %% more-c-code
Sample Lex File:
%% " " { } \n { printf ("Howdy!\n"); } begin { printf ("Keyword-Begin\n"); } [a-z][a-z0-9_]* { printf ("Look, an identifier: '%s'\n", yytext); } %% int yywrap () { return 1; }
The yywrap
function gets called by the lexer whenever it
hits the end of file. If yywrap
returns a nonzero value,
the lexer stops.
Using Lex Files:
Suppose we saved this as example.l
(yes, .l
is a normal suffix for lex
files; .yy
is another
one). To build an
executable from it, you would do the following:
$ flex example.l # Generates file lex.yy.c $ gcc -lfl lex.yy.c # Generates file a.out $ a.out < file
Alternately, you could use make to convert example.l
into
example.c
$ make example.c flex -t example.l > example.c
Exercises:
1. Figure out what the example file does with things not described in the regular expressions. (You should be able to identify some natural things not described in the regular expressions.)
2.a. Determine whether flex follows the longest token rule (e.g., whether,
in the example above, given input of begine
it matches the keyword
begin
or the identifier begine
).
2.b. If flex does follow the longest token rule, determine what flex does
if it has to backtrack. For example, given keywords of begin
and beginning (and no identifiers), what does it do if it sees
beginn
. (You'll need to write your own flex proram to test
this issue.)
3. Determine if f?lex
generates case-sensitive or
case-insensitive lexers. Then, find out how to make it generate
the other kind of lexer. (I.e., if it generates case-sensitive
lexers, find out how to make it generate case-insensitive lexers.)
4. Write flex programs to separate input in each of the following ways. (I'd prefer that you print out things with their type.)
The pattern/action rules in your program will generally look like
pattern { printf("DESCRIPTIVE-WORD[%s]\n", yytext); }
For example, the output of the first program might be something like the following:
WHITESPACE[ ] NORMAL[11February2004] WHITESPACE[ ] NORMAL[Dear] WHITESPACE[ ] NORMAL[Fred,]
5. Look at one of the copies of lex.yy.c
and see if you can
figure out where and how the finite automaton is coded. You might want
to try variants of the lex
program to see what difference
it makes.
6. While f?lex
can be used for a variety of purposes, in
this course, we will use it to build our tokenizer. To tokenize, we'll
want the C code for each pattern to return a value that corresponds to
the token.
Sketch how you might rewrite your keywords + identifiers + other tokenizer so that it returns the next token each time it is called.
You may find the following main
routine helpful.
int main (int argc, char *argv[]) { int type; // yylex return 0 upon encountering EOF. while (type = yylex ()) print_token (type); return 0; } // main
Monday, 16 September 2002 [Samuel A. Rebelsky]
Tuesday, 17 September 2002 [Samuel A. Rebelsky]
Wednesday, 11 February 2004 [Samuel A. Rebelsky]
Thursday, 12 February 2004 [Samuel A. Rebelsky]
http://www.cs.grinnell.edu/~rebelsky/Courses/CSC362/2004S/Labs/lab.04.html
.
Monday, 19 September 2011 [Samuel A. Rebelsky]
.l
instead of .yy
.
http://www.cs.grinnell.edu/~rebelsky/Courses/CSC362/2011F/Labs/lex.html
.
[Skip to Body]
Primary:
[Front Door]
[Schedule]
[Piazza]
-
[Academic Honesty]
[Instructions]
Current:
[Current Outline]
[Current EBoard]
-
[Current Assignment]
[Current Lab]
Groupings:
[Assignments]
[Documents]
[EBoards]
[Examples]
[Exams]
[Handouts]
[Labs]
[Outlines]
[Readings]
Related Courses:
[CSC362 2004S (Rebelsky)]
[CSC362 2010S (Stone)]
Misc:
[SamR]
[GNU Coding Standards]
[Dragon Book]
[Pascal Standards]
Disclaimer:
I usually create these pages on the fly
, which means that I rarely
proofread them and they may contain bad grammar and incorrect details.
It also means that I tend to update them regularly (see the history for
more details). Feel free to contact me with any suggestions for changes.
This document was generated by
Siteweaver on Sat Nov 12 22:53:44 2011.
The source to the document was last modified on Mon Sep 19 09:31:55 2011.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2011F/Labs/lex.html
.