Held: Monday, 9 February 2004
Today we start to consider how to move from the readable but declarative
regular expression notation to the executable but obtuse finite automaton
- Today at 2:15 in 2417, there is a talk entitled Beyond Pizza and Code: The ACM Code of Ethics. Extra credit for attending.
- Today at 4:30, there is a talk entitled GUI: More than just a pretty interface. Extra credit for attending.
- From regular expressions to NFAs
- An example
- Regular expressions and DFAs both can represent some simple but useful languages.
- They can also be used to tokenize.
- Regular expressions are declarative: They say what, rather than how.
- DFAs are imperative: They also say how.
- Some languages are easier to represent with regular expressions, some with DFAs, some with NFAs.
- Note that regular expressions are also non
- Many useful languages cannot be represented by regular expressions.
- For example,
strings of a's and b's with equal numbers of a's and b's
- Here's an even simpler one: akbk
- How would you prove that you can't represent that last one? (No, I won't put it here, since many of you read along.)
- Hint: Think about the F in DFA.
- We've now seen two mechanisms for describing tokens (and, in effect, for tokenizing): regular expressions and finite automata.
- How do we go from one to another?
- We convert regular expressions
to nondeterministic finite automata using a surprisingly naive
- We convert NFAs to DFAs.
- We then
optimize the DFAs.
- Our regular expression to NFA converter builds a finite automaton with
one final state (and one start state). It uses the recursive definitions
of regular expressions to build a finite automaton for a compound
expression from the finite automata for the base expressions.
- Each created NFA has a unique start state and a unique final state.
- The NFA for the empty string has two states with an
epsilon transition between the states. The first state is the
start state. The second state is the final state.
- The NFA for a single symbol is similar, except that the
transition is labeled with the symbol.
- The NFA for alternation has a new start state with
epsilon edges to the start states of the NFA for the
subexpressions and a new
final state with epsilon edges from the final states of the
NFA for the subexpressions.
- The finite automaton for concatenation uses the start state of the
NFA for the first expression and the final state of the
NFA for the second expression. There is an epislon transition
from the final state of the first NFA to the start state of
the second NFA.
- Alternatively, we can merge the final state of the first NFA
with the start state of the second NFA.
- The NFA for repetition has a new start state and a
new final state. There are also epsilon transitions
- from the start state to the final state,
- from the start state to the start state of the sub-NFA,
- from the final state of the sub-NFA to the final state, and
- from the final state of the sub-NFA to the start state of
- If it seems to you that some of these extra states and epsilon transitions
are unnecessary, you're probably right. However, you can't get rid of all
- For example, consider what happens if you don't create any extra states
for concatenation or repetition.
- We'll develop some examples
on the fly.