Compilers (CS362 2004S)

Tokenizing with java.io.StreamTokenizer

Summary: In today's lab, you will explore Sun's java.io.StreamTokenizer class. In particular, you will investigate its default behavior, update it to tokenize Scheme, and consider its fitness for a Pascal tokenizer.

Contents:

Preliminaries

Collaboration: Feel free to work on this lab in pairs or trios.

Turning It In: Email me your answers to question 5.

References:

Background

As you may have noticed in your work with Java, the designers of the standard Java API have included a lot of some special-purpose classes. A few of those classes relate to lexical analysis, particularly java.io.StreamTokenizer and java.util.StringTokenizer.

Both of these classes provide configurable lexical analyzers.

I've written a short program that permits you to explore the behavior of java.io.StreamTokenizer. You should use that program as the starting point for your explorations.

Exercises

Exercise 0: Preparation

a Make a copy of the files for today's lab with

cvs -d  /home/rebelsky/Web/Courses/CS362/2004S/CVS checkout Lab05

b. Compile PrintTokens.java.

c. Ask PrintTokens to tokenize PrintTokens.java.

d. You may also find it useful to scan the documentation for java.io.StreamTokenizer.

Exercise 1: Exploring java.io.StreamTokenizer's default behavior.

By creating your own input file, figure out answers to the following questions.

a. What are the default word (identifier) characters?

b. Does the tokenizer parse numbers by default?

c. Is there a single-line comment character? If so, what is it?

d. Does the tokenizer ignore C-style comments?

e. What form do strings take?

f. What does the tokenizer do with unmatched strings?

Exercise 2: Tokenizing Scheme

Extend PrintTokens to tokenize Scheme programs. Test it on a few simple Scheme programs, such as ones you've saved from CSC151. You may find it helpful to refer to The department's local copy of The Revised(5) Report on the Algorithmic Language Scheme.

You should consider all of the following issues:

a. What do comments look like in Scheme?

b. What characters can appear in Scheme words?

c. What do strings look like in Scheme?

d. What other kinds of tokens does Scheme use? (You may have to wrap some of these into the word token class.)

Exercise 3: Counting Words

Using the general structure of your modified Scheme token printer and java.util.Hashtable, build a program that counts how many times each word appears in a Scheme program.

Exercise 4: Tokenizing Pascal

Figure out how suitable java.io.StreamTokenizer is for tokenizing Pascal by considering the following questions:

a. Can you configure an object of class java.io.StreamTokenizer to correctly handle Pascal identifiers? If so, how?

b. Can you configure an object of class java.io.StreamTokenizer to correctly handle Pascal numbers? If so, how?

c. Can you configure an object of class java.io.StreamTokenizer to correctly handle Pascal-style brace comments?

d. Can you configure an object of class java.io.StreamTokenizer to correctly handle Pascal-style parent-star comments?

e. Are there other kinds of tokens you need to consider? If so, is it possible to make java.io.StreamTokenizer handle those tokens?

Exercise 5: Reflection

a. What aspects of StreamTokenizer.java make it a preferable mechanism for tokenizing for the compiler (as compared to the hand-coded tokenizers I asked you to do)?

b. What aspects of StreamTokenizer.java make it a less suitable mechanism for tokenizing for the compiler (as compared to the hand-coded tokenizers I asked you to do)?

 

History

Monday, 23 September 2002 [Samuel A. Rebelsky]

Tuesday, 24 September 2002 [Samuel A. Rebelsky]

Wednesday, 18 February 2004 [Samuel A. Rebelsky]

 

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Wed May 5 11:46:49 2004.
The source to the document was last modified on Wed Feb 18 20:59:21 2004.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2004S/Labs/lab.05.html.

You may wish to validate this document's HTML ; Valid CSS! ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu