The Accent Compiler Compiler

The ACCENT Grammar Language

Accent
Overview
Tutorial
Language
Installation
Usage
Lex
Algorithms
Distribution

Conventions

The Accent Grammar Language is described by rules of the form
   N :
      M11 M12 ...
   |  M21 M22 ...
      ...
   ;
which state that a phrase for N is composed from phrases for M11 and M12 and ... or from phrases for M21 and M22 ... , etc.

Terminal symbols are enclosed in double quotes, e.g. "%out".

In addition, the terminal symbol identifier denotes a sequence of one or more letters, digits, and underscores ("_"), starting with a letter.

The terminal symbol number denotes a sequence of one or more digits.

The terminal symbol character_constant denotes a character constant as in the C language.

The terminal symbol c_code represents arbitrary C code (comments in this code must be closed and curly braces much match).

Grammar

grammar :
   global_prelude_part token_declaration_part rule_part
;
The Accent Grammar Language is the set of phrases for the symbol grammar.

Global Prelude

global_prelude_part :
   global_prelude
|  empty
;

global_prelude :
   "%prelude" block
;

block :
   "{" c_code "}"
;

empty :
;
The optional global_prelude_part serves to introduce user defined functions, global variables, and types. The text enclosed in curly braces is inserted verbatim at the beginning of the generated program file.

Token Declarations

token_declaration_part :
   "%token" token_declaration_list ";"
|  empty
;

token_declaration_list :
   token_declaration "," token_declaration_list
|  token_declaration
;

token_declaration :
   identifier
;
The optional token_declaration_part introduces symbolic names for terminal symbols (tokens). A name must not appear more than once in the list.

These names may be used as members of grammatical rules. The actual representation of the corresponding terminal symbols must be defined by lexical rules that are not part of the Accent specification.

As opposed to nonterminal symbols, terminal symbols are declared without parameters. Nevertheless they have an implicit output parameter of type YYSTYPE which (if used) must be defined in the corresponding lexical rule.

(See Using Lex with Accent for a discussion.)

Rule Part

rule_part :
   rule_list
;

rule_list :
   rule rule_list
|  rule
;

rule :
   left_hand_side ":" right_hand_side ";"
;
A nonterminal is defined by a rule that lists one or more alternatives how to construct a phrase of the nonterminal.

The first rule specifies the start symbol of the grammar. The language defined by the grammar is given by the phrases of the start symbol.

Left Hand Side

left_hand_side :
   nonterminal formal_parameter_spec
;

nonterminal :
   identifier
;
The left_hand_side of a rule introduces the nonterminal that is defined by the rule. It also specifies parameters of the nonterminal, they represent the semantic attributes of the nonterminal.

Example

   List<list>
Here the nonterminal List has an attribute list.

The value of these parameters must be defined by semantic actions in the alternatives of the body of the rule. When the nonterminal is used as a member in the body of a rule, actual parameters are attached. Using theses parameters, the attributes of the corresponding nonterminal can be accessed.

Example

   List<list> :
      Item<head> List<tail> { *list = makelist(head, tail); }
   |                        { *list = emptylist();          }
   ;
The nonterminals of the first alternative, Item and List have parameters head and tail, respectively. These are used in the semantic action to compute the value of parameter of the left hand side, list.
formal_parameter_spec :
   empty
|  "<" parameter_spec_list ">"
|  "<" "%in" parameter_spec_list ">"
|  "<" "%out" parameter_spec_list ">"
|  "<" "%in" parameter_spec_list "%out" parameter_spec_list ">"
;

parameter_spec_list :
   parameter_spec "," parameter_spec_list
|  parameter_spec
;
Parameters may be of mode in or mode out. If no mode is specified, all parameters are of mode out. Otherwise, parameters are of mode in if they appear in a list preceded by %in; they are of mode out if the list is preceded by %out.

An in parameter (inherited attribute) passes a value from the application of a nonterminal to the right hand side defining the symbol. It is used to pass context information to a rule.

An out parameter (synthesized attribute) passes a value from the right hand side defining a symbol to the application of the symbol. It is used to pass the semantic value of a rule to the context.

Example

DeclarationPart<%out symtab>:
   /* ... */ ;
StatementPart<%in symtab %out code>:
   /* ... */ ;
Program<code>:
   DeclarationPart<symtab> StatementPart<symtab, code>;
In the rule for Program the output parameter of DeclarationPart is passed as an input parameter to StatementPart

parameter_spec :
   parameter_type_opt parameter_name
;

parameter_type_opt :
   parameter_type
|  empty
;

parameter_type :
   identifier
;

parameter_name :
   identifier
;

A parameter specification may be written in the form type name in which case type is the type of the parameter name. If the type is missing, the parameter is of type YYSTYPE (which is also the type of tokens). YYSTYPE is equivalent to long if not defined by the user.

(See the Using Lex with Accent how to define YYSTYPE.)

The start symbol of the grammar must have no parameter.

Right Hand Side

right_hand_side :
   local_prelude_option alternative_list
;
The right hand side of a rule specifies a list of alternatives. This list may be preceded by a prelude that introduces common declarations and initialisation statement in C.
local_prelude_option :
   local_prelude
|  empty
;

local_prelude :
   "%prelude" block
;
In the generated program the content of block (without the enclosing parentheses) precedes the code generated for the alternatives of the rule. The items declared in the prelude are visible within all alternatives.
alternative_list :
   alternative "|" alternative_list
|  alternative
;

alternative :
   member_list alternative_annotation_option
;


member_list :
   member member_list
|  empty
;

alternative_annotation_option :
   alternative_annotation
|  empty
;

alternative_annotation :
   "%prio" number
;

member :
   member_annotation_option item
;

member_annotation_option :
   member_annotation
|  empty
;

member_annotation :
   "%short"
|  "%long"
;

item :
   symbol
|  literal
|  grouping
|  option
|  repetition
|  semantic_action
;
The alternatives appearing on the right hand side of a rule specify how to construct a phrase for the nonterminal of the left hand side. An alternative is a sequence of members. These members may be nonterminal symbols, token symbols, or literals (terminal symbols that appear verbatim in the grammar). The right hand side may be written as an regular expression constructed by grouping, option, and repetition. At all places semantic actions may be inserted.

Ambiguities in the grammar may be resolved by annotating alternatives and members.

If two alternatives of a nonterminal can produce the same string then both alternatives must be postfixed by an annotation of the form

   %prio N
N defines the priority of the alternative. The alternative with the higher priority is selected.

If the same alternative can produce can produce the same string in more than one way because members of that alternative can cover substrings of that string of different length, the rightmost of these members must be prefixed with an annotation of the form

   %short
or
   %long
If the member is prefixed by "%short" (resp. "%long") the variant that produces the short (resp. long) substring is selected.

Nonterminal and Terminal Symbols

symbol :
   symbol_name actual_parameters_option
;

symbol_name :
   identifier
;
The symbol name must be declared as a nonterminal (by specifying a rule for the identifier) or as a token (by listing the identifier in the token declaration part).
actual_parameters_option :
   actual_parameters
|  empty
;

actual_parameters :
   "<" actual_parameter_list ">"
;

actual_parameters_list :
   actual_parameter "," actual_parameter_list
|  actual_parameter
;

actual_parameter :
   identifier
;
For each formal parameter of the symbol there must be a corresponding actual parameter. A parameter must be an identifier.

In the generated C code, this identifier is declared as a variable of the type of the corresponding formal parameter. The same parameter name may be used at different places but then the type of the positions must be identical.

literal :
   character_constant
;
Besides being declared as a token, a terminal symbol can also appear verbatim as a member of rule.

Structured Members

grouping :
   "(" alternative_list ")"
;
A construct
   ( alt_1 | alt_2 | ... )
matches a phrase generated by the alternatives alt_i
option :
   "(" alternative_list ")?"
;
A construct
   ( alt_1 | alt_2 | ... )?
matches the empty phrase or a phrase generated by the alternatives alt_i
repetition :
   "(" alternative_list ")*"
;
A construct
   ( alt_1 | alt_2 | ... )*
matches the empty phrase or any sequence of phrases generated by the alternatives alt_i

Semantic Actions

semantic_action :
   block
;
Semantic actions may be inserted as members of alternatives. They do not influence the parsing process.

Semantic actions can contain arbitrary C code enclosed in curly braces. This code is executed in a second phase after the parsing process. The semantic actions of selected alternatives are executed from left to right in the given order.

Output parameters of preceding symbols may be accessed in the semantic action. Input parameters of following symbols must be defined.

Parameters are accessed by specifying their names. The name of the output parameters of the left hand side must be preceded by a dereferencing operator ('*').

In the generated program the curly braces enclosing the action do not appear in the generated program (hence a semantic action at the beginning of an alternative may contain declarations of variables that local to the alternative).

accent.compilertools.net