Visitor
Remind you: Conversion from programmer/design
friendly reg-exp to computer-friendly DFA
Problem:
The resultant DFA is big when you convert an NFA to
a DFA.
Solution:
Look at ways to make DFA's smallera
Example on board
Observation: If qi and qj have the same edges
with the same labels and the same finality,
you can join them.
Revised Solution:
Go overboard on joining:
All non-final states are the same
All final states are the same
Given that that's wrong, refine
Whenever we notice a mistake (e.g., an edge that
should go to two different sets), split a set
// Initialization
S0 = { }
S1 = { }
for each q in Q
if q.isFinal, add q to S0
else add q to S1
repeat almost forever
for each state, Si
for all qj in Si
for all qk in Si
for all s in Sigma
if delta(qj,s) is in a different S than
delta(qk,s)
split Si
if you didn't split, exit repeat
How to build your lexer in N easy steps
(1) Write your regular expressions
(2) Convert them to NFAs
(3) Merge all the NFAs together, augmenting their
final states with a notation as to symbol type.
(4) Convert it to a DFA
(5) Optimize the DFA
Hack when a final state might represent different
symbols
(6) Convert the DFA to working codee or interpret
the DFA indirectly
OR
(1) Write your regular expressions
(2) Write a regular-expression compiler
(3) Run your regular-expression compiler on your regular
expressions
OR
(1) Write your regular expressions
(2) Use someone else's regular expression compiler
OR
(1) Find someone else's regular expressions
(2) Convert them to a lexer in an ad-hoc format
OR
(1) Find someone else's lexer
Problem: Once we have this DFA, augmented with notations as
to the token type, how do we use it to tokenize rather than
to identify membership?
CHALLENGE: IS THERE A PLACE IN PASCAL IN WHICH YOU CAN'T
TELL WHETHER YOU'VE ENDED ONE TOKEN AND STARTED ANOTHER
(EXCEPT WITH CONTEXT?)