Today in CS362 from RegExp to DFA, Continued Admin Q on HW1? Project, Phase 1 coming soon Visitor Remind you: Conversion from programmer/design friendly reg-exp to computer-friendly DFA Problem: The resultant DFA is big when you convert an NFA to a DFA. Solution: Look at ways to make DFA's smallera Example on board Observation: If qi and qj have the same edges with the same labels and the same finality, you can join them. Revised Solution: Go overboard on joining: All non-final states are the same All final states are the same Given that that's wrong, refine Whenever we notice a mistake (e.g., an edge that should go to two different sets), split a set // Initialization S0 = { } S1 = { } for each q in Q if q.isFinal, add q to S0 else add q to S1 repeat almost forever for each state, Si for all qj in Si for all qk in Si for all s in Sigma if delta(qj,s) is in a different S than delta(qk,s) split Si if you didn't split, exit repeat How to build your lexer in N easy steps (1) Write your regular expressions (2) Convert them to NFAs (3) Merge all the NFAs together, augmenting their final states with a notation as to symbol type. (4) Convert it to a DFA (5) Optimize the DFA Hack when a final state might represent different symbols (6) Convert the DFA to working codee or interpret the DFA indirectly OR (1) Write your regular expressions (2) Write a regular-expression compiler (3) Run your regular-expression compiler on your regular expressions OR (1) Write your regular expressions (2) Use someone else's regular expression compiler OR (1) Find someone else's regular expressions (2) Convert them to a lexer in an ad-hoc format OR (1) Find someone else's lexer Problem: Once we have this DFA, augmented with notations as to the token type, how do we use it to tokenize rather than to identify membership? CHALLENGE: IS THERE A PLACE IN PASCAL IN WHICH YOU CAN'T TELL WHETHER YOU'VE ENDED ONE TOKEN AND STARTED ANOTHER (EXCEPT WITH CONTEXT?)