Compilers (CS362 2002F)

Tokenizing with


Summary: In today's lab, you will explore Sun's class. In particular, you will investigate its default behavior, update it to tokenize Scheme, and consider its fitness for our lexical analyzer.

Collaboration: Feel free to work on this lab in pairs or trios.

Turning It In: Save your answers in a plain text file and submit it using the ECA. You should cut and paste any Java code you write into that file.

Grading: I expect that you will gain more from doing this lab than from me grading this lab. I may simply scan through your answers to see if you had any particularly valuable insights.

Supporting Files:



As you may have noticed in your work with Java, the designers of the standard Java API have included a lot of some special-purpose classes. A few of those classes relate to lexical analysis, particularly and java.util.StringTokenizer.

Both of these classes provide configurable lexical analyzers.

I've written a short program that permits you to explore the behavior of You should use that program as the starting point for your explorations.


Exercise 0: Preparation

a. Make a copy of

b. Make sure that your PATH environment variable includes /net/j2sdk1.4.0/bin (or that you have an alias to the Java compiler and interpreter in that directory).

c. Compile

d. Ask PrintTokens to tokenize

e. You may also find it useful to scan the documentation for

Exercise 1: Exploring's default behavior.

By creating your own input file, figure out answers to the following questions.

a. What are the default word (identifier) characters?

b. Does the tokenizer parse numbers by default?

c. Is there a single-line comment character? If so, what is it?

d. Does the tokenizer ignore C-style comments?

e. What form do strings take?

f. What does the tokenizer do with unmatched strings?

Exercise 2: Tokenizing Scheme

Extend PrintTokens to tokenize Scheme programs. Test it on a few simple Scheme programs, such as ones you've saved from CSC151. You may find it helpful to refer to The department's local copy of The Revised(5) Report on the Algorithmic Language Scheme.

You should consider all of the following issues:

a. What do comments look like in Scheme?

b. What characters can appear in Scheme words?

c. What do strings look like in Scheme?

d. What other kinds of tokens does Scheme use? (You may have to wrap some of these into the word token class.)

Exercise 3: Counting Words

Using the general structure of your modified Scheme token printer and java.util.Hashtable, build a program that counts how many times each word appears in a Scheme program.

Exercise 4: Tokenizing Pascal

Figure out how suitable is for tokenizing Pascal by considering the following questions:

a. Can you configure an object of class to correctly handle Pascal identifiers? If so, how?

b. Can you configure an object of class to correctly handle Pascal numbers? If so, how?

c. Can you configure an object of class to correctly handle Pascal-style brace comments?

d. Can you configure an object of class to correctly handle Pascal-style parent-star comments?

e. Are there other kinds of tokens you need to consider? If so, is it possible to make handle those tokens?



Monday, 23 September 2002 [Samuel A. Rebelsky]

  • Designed code and general structure of lab.

Tuesday, 24 September 2002 [Samuel A. Rebelsky]

  • Wrote the lab.


Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Tue Dec 10 08:53:27 2002.
The source to the document was last modified on Tue Sep 24 08:43:41 2002.
This document may be found at

You may wish to validate this document's HTML ; Valid CSS! ; Check with Bobby

Glimmer Labs: The Grinnell Laboratory for Interactive Multimedia Experimentation & Research