CSC362 2004S, Class 9: From Regular Expression to DFA (1) Admin * Excused: Arjun * Late: Piper, Angela, Choed * Project, Phase 1 due NOW. * Please submit your project via email as a gzip'd tarball or as a bunch of attached files or any reasonable variant * We have hired a new Mathematician (Keri Kornelson (sp?)) * Talks at 2:15 and 4:30. (Finally, you can leave out the back :-) * New homework: Written Homework 1 * Fun questions * Regular expression for ab ("not ba") ba? * (([bcd]*[cd]+)?[acd]*)*b* * Does this include all the strings that it should? * Does it include any that it shouldn't? * How can you be *sure* that this is correct? * Convert to optimal DFA and compare to known correct optimal DFA * Prove that there are no cases for which he is incorrect: * induce on the number of time we apply each * or + * induce on the length of the strings produced * Proof by contradiction * Sometimes the scientific method suffices * When did you need extra lookahead for the tokenizer? * Suppose you see a digit (e.g., 3). If you peek at ... * A digit: Consume and append * A dot: Consume and append and EXPECT a digit * An e: It's some complicated exponentional form: Consume and continue * Almost anything else: Do not consume; we've finished the token * However, dots are problematic 3..4 * Need some variant of "extra buffer" Overview: * Review * Limitations * How to go from regexp to DFA * From regexp to NFA /Review/ * Three formal mechanisms for describing simple languages * Regular expressions * DFA * NFA * Regular expressions are nondeterministic Consider ab*b We're reading input. We see an a. We see a b. We see another b. We don't make the decision until we run out of string or decide. a(b*b|b*c) * Why are regular expressions preferable? * Concise * Somewhat readable * Why are DFA preferable? * Easier for some problems * Directly implementable /Going from RE to DFA/ * From RE to NFA * From NFA to DFA * From DFA to optimal DFA /From RE to NFA/ Recursive translation: * Translate the parts * Combine 'em together * At every stage, we will have an NFA with exactly one start state and exactly one final state /From NFA to DFA/ * Try to simulate 'every possible path' through the NFA simultaneously * States in DFA are sets of states in NFA How many states can you have in the DFA if the NFA has four states