![]() |
The Accent Compiler Compiler
Using LEX with ACCENT |
Accent Overview Tutorial Language Installation Usage Lex Algorithms Distribution |
The Scanner FunctionThe representation of terminal symbols (tokens) is not defined by the Accent specification. An Accent parser cooperates with a lexical scanner that converts the source text into a sequence of tokens. This scanner is implemented by a function yylex() that reads the next token and returns a value representing the kind of the token.The Kind of a TokenThe kind of a token is indicated by a number.A terminal symbol denoted by a literal in the Accent specification, e.g. '+', is represented by the numerical value of the character. So yylex() returns this value if it has recognized this literal: return '+';A terminal symbol denoted by a symbolic name declared in the token declaration part of the Accent specification, e.g. NUMBER, is represented by a constant with a symbolic name that is the same as the token name. So yylex returns this constant: return NUMBER;The definition of the constants is generated by Accent and is contained in the generated file yygrammar.h. Hence the file introducing yylex should include this file. #include "yygrammar.h" The Attribute of a TokenBesides having a kind (e.g. NUMBER) a token can also be augmented with a semantic attribute. The function yylex assigns this attribute value to the variable yylval. For exampleyylval = atoi(yytext);(here yytext is the actual token that has been recognized as a NUMBER; the function atoi() converts this string into a numerical value). The variable yylval is declared in the generated file yygrammar.c. An external declaration for this variable is provided in the generated file yygrammar.h. yylval is declared as of type YYSTYPE. This is defined by Accent in the file yygrammar.h as a macro standing for long. #ifndef YYSTYPE #define YYSTYPE long #endifThe user can define his or her own type before including the file yygrammar.h. For example, a file yystype.h may define typedef union { int intval; float floatval; } ATTRIBUTE; #define YYSTYPE ATTRIBUTENow the file defining yylex() imports two header files: #include "yystype.h" #include "yygrammar.h"and defines the semantic attribute by: yylval.intval = atoi(yytext); The Lex SpecificationThe function yylex can be generated by the scanner generator Lex (or the GNU implementation Flex).The Lex & Yacc Page has online documentation for Lex and Flex. A Lex specification gives rules that define for each token how it is represented and how it is processed. A rule has the form pattern { action }pattern is a regular expression that specifies the representation of the token. action is C code that specifies how the token is processed. This code sets the attribute value and returns the kind of the token. For example, here is a rule for the token NUMBER: [0-9]+ { yylval.intval = atoi(yytext); return NUMBER; }The Lex specification starts with a definition section which can be used to import header files and to declare variables. For example, %{ #include "yystype.h" #include "yygrammar.h" %} %%Here the section imports yystype.h to provide a user specific definition of YYSTYPE and yygrammar.h that defines the token codes. The %% separates this section from the rules part. The Accent SpecificationIn the Accent specification, tokens are introduced in the token declaration part.For example %token NUMBER;introduces a token with name NUMBER. Inside a rule the token can be used with a parameter, for example NUMBER<x>This parameter can then be used in actions to access the attribute of the token. It is of type YYSTYPE. Value : NUMBER<x> { printf("%d", x.intval); } ;or simply Value : NUMBER<x> { printf("%d", x); } ;if there is no user specific definition of YYSTYPE. As opposed to the Lex specification the import of yygrammar.h does not appear in the Accent specification. If the user specifies an own type YYSTYPE this has to be done in global prelude part, e.g. %prelude { #include "yystype.h" } Tracking the Source PositionLike yylval, which holds the attribute of a token, there is a further variable, yypos, thats holds the source position of the token.yypos is declared in the Accent runtime as an external variable of type long. Its initial value is 1. This variable can be set in rules of the Lex specification. For example, \n { yypos++; /* adjust linenumber and skip newline */ }If the newline character is seen, yypos is incremented and so holds the actual line number. The variable yypos is managed in in such a way that it holds the correct value when yyerror is invoked to report a syntax error (although due to lookahead already the next token is read). It has also a correct value when semantic actions are executed (note that this is done after lexical analysis and parsing). Hence it can be used inside semantic actions, for example value: NUMBER<n> { printf("value in line %d is %d\n", yypos, n); } ; accent.compilertools.net |