**Primary:**
[Skip To Body]
[Front Door]
[Current]
[Glance]
[Honesty]
[Instructions]
[Links]
[Search]
[Syllabus]

**Sets:**
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Project]
[Readings]
[Reference]

**ECA:**
[About]
[Grades]
[Quizzes]
[Submit Work]
[Change Password]
[Reset Password]

**Miscellaneous:**
[2001S]
[98F]
[SamR]
[Glimmer Labs]

**Held** Friday, September 13, 2002

**Summary**

Today we continue our consideration of how to move from the readable but declarative regular exprssion notation to the executable but obtuse finite automaton notation.

**Notes**

- If you have not done so already, please start reading Chapter 4 of the Red Dragon.
- I plan to email you about the project later today.

**Overview**

- From NFA to DFA
- From DFA to optimal DFA
- Reminder: Lexical analysis using automata

- For today, we'll try a slightly weird, but also interesting example for which I'll draw a DFA
- Strings in the language
- Start with a
- Are followed by an odd # of b's and an a
- Can repeat the (odd # of b's and a) with an intervening "aa"
- Can repeat the (odd # of b's and a) with an intervening "ca"
- Must have a c after all of the repeated (odd # of b's and a)
- Have an optional (odd # of b's) at the end

- As you may be able to tell, the DFAs constructed by the NFA to DFA construction are is fairly large.
- Can we build an equivalent DFA that's smaller? Yes!
- Strategy: Group states into sets of equivalent states.
- Surprisingly, it's not enough to do this bottom-up (that is,
if two states have the same edges, join them together).
- Sometimes you need to join states in pairs (or more)

- Even more surprisingly, it's better to do it top-down (assume that all states can be joined and refine that decision)
- Here's some pseudocode
Assume all non-final states can be treated as the same state Assume all final states can be treated as the same state For each group of states treated as equivalent as the same state For each symbol, s If there are two "equivalent" states q1,q2 such that edge(q1,s) and edge(q2,s) lead to non-equivalent states, split q1 and q2 into different equivalencies figure out where the other states in the equivalency go End For // each symbol End for // each pair of states

- How do we convert a series of token definitions (using regular expressions) to a tokenizing DFA?
- We build an NFA for each of the token definitions.
- We augment the final states with token type and priority.
- We put 'em together in one big automaton by adding a new start state with epsilon transitions to the start states of all the other automata.
- We convert the result to a DFA.
- When we make the determination of a final state, we use the associated token type of the highest priority final state in the NFA.

- We've seen how to match with finite automata, but how do we tokenize?
- We only tokenize with deterministic finite automata.
- We begin by attaching a token (or, sometimes, a set of instructions)
to each final state.
- When we reach an appropriate final state (possibly not the first final state), we stop and return the corresponding token.

- How do we decide whether a final state is appropriate? We always
keep track of the last final state we encountered, but peek ahead
until we find another final state or hit a dead end.
/** * Find the first token in the candidate string, starting * at a particular position. */ public token findToken(String candidate, starting_pos) begin State current_state = q0; for i = starting_pos to the length of candidate current_state = edge(current_state,candidate.symbolAt(i)) if current_state is a final state final_found = current_state final_pos = i end if if current_state is undefined then exit the for loop end if end for if (final_found is defined) then return the token given by final_found at position final_pos else no token can be found end if end

**Primary:**
[Skip To Body]
[Front Door]
[Current]
[Glance]
[Honesty]
[Instructions]
[Links]
[Search]
[Syllabus]

**Sets:**
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Project]
[Readings]
[Reference]

**ECA:**
[About]
[Grades]
[Quizzes]
[Submit Work]
[Change Password]
[Reset Password]

**Miscellaneous:**
[2001S]
[98F]
[SamR]
[Glimmer Labs]

**Disclaimer**:
I usually create these pages on the fly

, which means that I rarely
proofread them and they may contain bad grammar and incorrect details.
It also means that I tend to update them regularly (see the history for
more details). Feel free to contact me with any suggestions for changes.

This document was generated by
Siteweaver on Fri Dec 6 10:37:58 2002.

The source to the document was last modified on Wed Sep 4 10:08:34 2002.

This document may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2002F/Outlines/outline.07.html`

.

You may wish to validate this document's HTML ; ; Check with Bobby

Glimmer Labs: The Grinnell Laboratory for Interactive Multimedia Experimentation & Researchglimmer@grinnell.edu