Assigned: Wednesday, 7 February 2001
Due:Friday, 16 February 2001
Summary: In this stage of the project, you will design and build the lexical analyzer for your Pascal compilers.
Warning: You may be required to use each others' lexers at the next stage of the project.
Group Work: You should work in groups of size 3. I am likely to reassign groups for the next stage of the project.
1. Begin by deciding on the natural tokens for Pascal. You should do this in the next day or so.
2. Design Java classes for tokens. You should have a general
Token interface or abstract class and then make classes for
the indivdiual tokens that subclass or implement
(or that subclass a subclass or implement a subinterface of
3. Write regular expressions for the tokens.
4. Design a generic
Lexer class that provides appropriate
methods, such as
5. Implement the lexical analyzer (yeah, you knew there had to be a hard part, didn't you?).
6. Write a test program that reads in files and prints out their tokens (and, possibly, reports on errors as it encounters them).
You are free to implement the lexical analyzer in any manner you see fit. There are a number of options.
You can hand code the analyzer. This solution potentially gives you the most freedom (and perhaps even efficiency). For example, you can probably deal with some non-regular issues with this solution.
You can use an existing Java lexical analyzer generator. This solution is what many commercial compilers do. However, you will probably need to put a wrapper class around the results to create a more generic result. You will also need to learn the specifics of one of these systems.
You might consider using
There are also probably others out there.
You can rely on built-in Java classes, particularly
You can write your own lexical analyzer generator. You'll certainly learn a lot of lexical analysis, regular expressions, and automata if you choose this solution. However, it is also requires much more effort than the other options.
Wednesday, 7 February 2001
Disclaimer: I usually create these pages on the fly. This means that they are rarely proofread and may contain bad grammar and incorrect details. It also means that I may update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.
This page was generated by Siteweaver on Wed Mar 14 10:08:29 2001.
This page may be found at
You may validate this page's HTML.
The source was last modified Fri Feb 9 10:26:04 2001.