![]() |
The Accent Compiler Compiler
Algorithms |
||||
Accent Overview Tutorial Language Installation Usage Lex Algorithms Distribution |
Although the Accent user is not required to be familiar
with parsing technology,
we provide a short overview.
IntroductionParsing usually begins with processing the start symbol of the grammar.A nonterminal can be processed as follows. One selects an alternative and processes it from left to right. If at the current position inside the alternative there is a token, one compares this with the current input token. If they match the input token is eaten and the working point inside the rule is advanced to the next member. If at the current position there is a nonterminal, this nonterminal is processed and then the working point is advanced after the symbol. When at a given point during passing more than one alternative can be applied, there are several approaches:
Our approach combines exhaustive and predictive passing. Exhaustive parsing is used to achieve generality. Predictive parsing is used to improve efficency. Exhaustive ParsingWe use Earley's Algorithm [1] for exhaustive parsing.Whereas in Accent a nonterminal is defined by one rule with several alternatives N : A_1 | ... | A_nfor this discussion it is more convenient to define a nonterminal by several rules N : A_1 ... N : A_nAssume that N : M_1 ... M_i ... M_nis such a rule. When such a rule is processed we use a "dot" (denoted by "*") to indicate the actual position inside the rule. For example, in N : M_1 ... * M_i ... M_nthe next symbol being to be processed is M_i. Such a "dotted rule" is called an item. An item has also a "back-pointer" to find items that triggered the actual one (I do not discuss this here). Earley [1] proposes to attach a dynamically computed look ahead strings to items. Since we use static look ahead set computation we do not use this component. The algorithm constructs an item list for each input token. The kernel of the item list for a particular input token is constructed by a step called the scanner.
Processing starts with the item YYSTART : * S YYEOFwhere S is the start symbol of the grammar. The closure of this item determines the initial item list. Predictive ParsingPredictive parsing has been described by Lewis and Stearns [2].In this approach, for each alternative of a nonterminal a set of director tokens is computed at compiler generation time. This set contains all tokens that are legal tokens when we start to process the nonterminal. These are given by (1) those tokens that can start the alternative and (2), if the alternative can produce the empty string, the tokens that can follow a phrase for the nonterminal. For example, consider this simple grammar S : N 'x' ; N : A | B ; A : 'a' ; B : ;For the alternative N : A ;the set of director tokens is given by 'a', because this is the (only) token that is valid for this alternative. For the alternative N : B ;the set is given by 'x': B can produce the empty string so we can "look through" N in the rule for S and see the 'x' that follows N in the rule for S. When parsing a text we begin with the start symbol S and hence have to recognize an N. Assume that we are confronted with an 'a'. In this case we would choose the first alternative for N, because 'a' is in its director set. If we are confronted with an 'x', we would choose the second alternative, because 'x' is in its director set. In general, if a choice must be made which alternative has to be used to parse a phrase for a particular nonterminal, the first token of the rest of the input is inspected. If it appears in the director set the corresponding alternative is selected. In order that this works one has to postulate that the director sets of the alternatives of a nonterminal are mutually disjoint (otherwise it would not be clear what alternative should be selected). This restricts the class of grammars that can be processed with this approach. Predictive parsers are often implemented by recursive descent. Here one writes a procedure for each nonterminal that inspects the current input token and uses it to select an alternative. It then processes the members of this alternative. For example, the nonterminal N from the above grammar could be implemented in this way: procedure N() if current_token in { 'a' } then A(); else if current_token in { 'x' } then B(); else Error();Such a procedure can be used to add semantic processing to parsing. Just include the code at arbitrary places inside the code for the alternatives. This is possible, because there is no backtracking or processing of several rules in parallel. It would not work with exhaustive parsing. Combined ParsingAccent combines exhaustive and predictive parsing.Exhaustive parsing is implemented as described above. We cannot use predictive parsing directly because it would narrow the class of grammars that we want to process. We cannot use director sets to deterministically select an alternative, but we can use them to exclude an alternative that would not be viable. For example, if the input a xis parsed using the grammar above, the item S : * N 'x'would cause the predictor to generate the item N : * ABut it would also generate N : * Bwhich in turn would trigger B : *This item would then cause the completer to create N : B *and then S : N * 'x'This item cannot be continued because the input starts with 'a'. The director set of the alternative N : Bcontains only the element 'x'. Because the current input token is 'a' this alternative is not viable. Predictive parsing requires that the director sets uniquely determines one alternative. In Accent, if the current token is in the director set of more than one alternative, all these alternatives are processed. Alternatives with a director set that does not contain the current token are excluded. Accent also uses recursive descent to execute semantic actions. This cannot be done during parsing because several rules can be processed in parallel. Hence there is a second pass. The generated procedures look like those presented above, they do not inspect the director set but use the result of the parsing to select an alternative. Structure InformationAccent parsers do not compute derivation trees after building the item sequence but attach structure information directly to items.There is a "sub-pointer" that refers to the "subtree item" of the current item. If the item has the form M : alpha N * betathe "sub-pointer" refers to an item of the form N : gamma *i.e. the item that concluded the processing of N. A "left-pointer" is used as a reference to "preceding items". If the item has the form M : alpha N * betathen the "left-pointer" refers to an item M : alpha * N betai.e. the item that triggered the processing of N. This information can be used to detect and resolve ambiguities at the earliest point. References
accent.compilertools.net |