Compilers (CS362 2004S)

Project, Phase 1: Lexical Analysis

Assigned: Friday, 30 January 2004
Due:Monday, 9 February 2004

Summary: In this stage of the project, you will design and build the lexical analyzer for your Pascal compilers.

Warning: You may be required to use each others' lexers at the next stage of the project.

Warning: You will be working on this stage of the project at the same time you are learning the theory of lexical analysis. This simulataneous work is an experiment suggested by the previous session of the class.

Note: In past sessions of CSC362, I've given students a lot of freedom on the various stages of the project. This year, I'm giving students much less freedom in the hopes that greater guidance will lead to greater success.

Group Work: You should work in groups of size 3. For this stage, you can choose your own groups. I am likely to reassign groups for the next stage of the project.

Building a Lexical Analyzer

1. Begin by identifying the natural tokens for Pascal. You should do this by Monday. The Pascal User Manual and Report is a good place to start.

2. Examine the classes in rebelsky.compiler.lexer. Identify the roles (or potential roles) each class plays. You may find it useful to look at StupidTokenizer.java and StupidTokens.java.

3. Implement the lexical analyzer (yeah, you knew there had to be a hard part, didn't you?). You should call your lexical analyzer PascalTokenizer. It should implement the rebelsky.compiler.lexer.TokenStream interface. The constructor for your class should take a rebelsky.compiler.misc.CharStream as input.

4. Write a test program that reads in files and prints out their tokens (and, possibly, reports on errors as it encounters them). You might want to use TestST.java as a starting point for your explorations.

Implementation Options

You are free to implement the lexical analyzer in one of two ways.

You can modify and extend StupidTokens.java and StupidTokenizer.java. This option is probably the most straightforward and easiest. However, you will have to deal with some subtleties of my code and consider the differences between STUPID and Pascal.

You can throw away my code and hand code the analyzer. This solution potentially gives you the most freedom (and perhaps even efficiency). For example, you can probably deal with some non-regular issues with this solution.

In addition, if you have the inclination and lots and lots of spare time, you can write your own lexical analyzer generator. You'll certainly learn a lot of lexical analysis, regular expressions, and automata if you choose this solution. If you choose this option, you should do it in addition to the other options.

Project Groups

 

History

Wednesday, 28 January 2004 [Samuel A. Rebelsky]

 

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Mon Apr 26 13:19:32 2004.
The source to the document was last modified on Tue Feb 3 16:23:09 2004.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2004S/Project/lexer.html.

You may wish to validate this document's HTML ; Valid CSS! ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu