CSC362 2004S, Class 19: A Sample Parser Admin: * Today's general goal: Help you move forward on your parsers. * Don't forget about the Math/CS journal club today at 4:30 in 2424. * Knuth's Musings on "The Anatomy of Rotations in Binary Trees" * Snacks in the lounge at 4:15 * Extra credit for attending and reflecting * Tomorrow's convo looks to be quite interesting. * The focus of today is answering questions about your parsers. * "Baby" is this weekend! Come find out why Paul has been hide-ing (or is the heid-ing) * No 2:15 office hours today (teaching CSC151 instead) Overview: * Building the parse functions. * An example. * Explaining expressions. How do we get from compound_statement ::= BEGIN statement SEMI { statement SEMI } END to private Node parseCompoundStatement(StupidTokenizer tokens) throws ParseException,Exception { // Sanity check: Compound statements begin with begin. if (tokens.peek() != StupidTokens.TBEGIN) return wrongToken(StupidNonterminals.COMPOUND_STATEMENT, tokens.peek()); // Skip over the begin. tokens.next(); // Prepare to build a list of statements. Vector statements = new Vector(); // Keep reading statements until we hit the end of the compound statement. while (tokens.peek() != StupidTokens.TEND) { statements.add(parseStatement(tokens)); consume(StupidTokens.TSEMI, tokens); } // while // Sanity check: Do we have the end symbol? if (tokens.peek() != StupidTokens.TEND) return wrongToken(StupidTokens.TEND, tokens.peek()); // Read over that symbol tokens.next(); // Build the nice new list return new Node(StupidNonterminals.COMPOUND_STATEMENT, statements); } // parseCompoundStatement(StupidTokenizer) The process is "nearly" automatable * First: Compute the three important tables: First, Follow, and Nullable * Expectation: Generate a parse function for each nonterminal the Node constructor expects * A Symbol (Nonterminal or Token) * A Vector Node parse_foo(TokenStream ts) throws ParseException,Exception { Vector children = new Vector(); Token tok = ts.peek(); ... return new Node(Nonterminals.foo, children); } // parse_nonterminal(TokenStream) The stuff in the middle (the ellipses) depends on the particular productions for the nonterminal you are using. foo ::= rhs foo ::= sym1 sym2 ... symk Look at first(rhs) = { S1 ... Sl } Add the following to the code above // If tok begins rhs then ... if ((tok == S1) || (tok == S2) || ... || (tok == Sl)) { // Build the tree for rhs ; Apply the appropriate rhs } If nullable(rhs), look at Follow[foo] = { T1 ... Tj } if ((tok == T1) || (tok == T2) || ... || (tok == Tj)) { ; Apply the appropriate rhs } Note that you do both things *for every rule* with foo as a lhs. All the conditionals go into the middle of parse_foo We will have a problem if two rhs's can begin with the same token. Fix the grammar so that it doesn't happen. (Left factor) Note that if you've generated both conditionals above for the same rhs, you might as well write // If tok begins rhs then or rhs is nullable and tok follows foo if ((tok == S1) || (tok == S2) || ... || (tok == Sl) || (tok == T1) || (tok == T2) || ... || (tok == Tj)) { // Build the tree for rhs ; Apply the appropriate rhs } WHOOPS! For some particular kinds of tokens (Identifiers, Strings, etc.) you need instanceof rather than ==. As good java programmers you know when to use instanceof. What does it mean to "apply the appropriate rhs" foo ::= sym1 sym2 ... symk if symi is a nonterminal, use children.add(parse_symi(ts)); if symi is a USEFUL token, use children.add(new Node(ts.next(), null)); if symi is SYNTACTIC SUGAR< use consume(SYMI, ts); compound_statement ::= BEGIN statement SEMI { statement SEMI } END constant_definition ::= IDENTIFIER = constant Node parse_constant_definition(TokenStream ts) throws ParseException,Exception { Vector children = new Vector(); Token tok = ts.peek(); if (tok instanceof Identifier) { children.add(new Node(ts.next(), null)); consume(Tokens.EQUALS, ts); children.add(parseConstant(ts)); } else { throw new ParseException("Trying to parse a , found " + tok.toString()); } return new Node(Nonterminals.constant_definition, children); } // parse_constant_definition(TokenStream) When should you create another nonterminal (as in the example from variables): * Whenever more than one rhs has the same number of meaningful symbols * Perhaps: Whenever more than one rhs has at least one meaningful symbol * Observation: What you parse doesn't ahve to be what you return * E.g., parseVariable can return Nonterminals.array_reference Oge observes that my parseVariable doesn't work because the grammar is left recursive * Make it right recursive!