parse()
) is generated from the grammar
specification and so the software engineer will therefore hardly ever feel the
need to override that function. All but a few of the remaining predefined
members have very clear definitions and meanings as well, making it unlikely
that they should ever require overriding.
It is likely that members like lex()
and/or error()
need dedicated
definitions with different parsers generated by Bison++; but then again: while
defining the grammar the definition of the associated support members is a
natural extension of defining the grammar, and can be realized in parallel
with defining the grammar, in practice not requiring any virtual members. By
not defining (requiring) virtual members the parser's class organization is
simplified, and the calling of the non-virtual members will be just a trifle
faster than when these member functions would have been virtual.
In this chapter all available members and features of the generated parser class are discussed. Having read this chapter you should be able to use the generated parser class in your program (using its public members) and to use its facilities in the actions defined for the various production rules and/or use these facilities in additional class members that you might have defined yourself.
In the remainder of this chapter the class's public members are first
discussed, to be followed by the class's private members. While constructing
the grammar all private members are available in the action parts of the
grammaticalrules. Furthermore, any member (and so not just from the action
blocks) may generate errors (thus initiating error recovery procedures) and
may flag the (un)successful parsing of the information given to the parser
(terminating the parsing function parse()
).
`Parser Class'::
prefixes are silently
implied):
%debug
directive or --debug
option was
specified.When debugging code has been compiled into the parsing
function, it is active by default. To activate the debugging
code, use setDebug(false).
DEFAULT_RECOVERY_MODE__, UNEXPECTED_TOKEN__DEFAULT_RECOVERY_MODE__ consists of terminating the parsing process. UNEXPECTED_TOKEN__ activates the recovery procedure whenever an error is encountered. The recovery procedure consists of looking for the first state on the state-stack having an error-production, and then skipping subsequent tokens until (in that state) a token is retrieved which may follow the error terminal token in that production rule. If this error recovery procedure fails (i.e., if no acceptable token is ever encountered) error recovery falls back to the default recovery mode, terminating the parsing process.
PARSE_ACCEPT = 0, PARSE_ABORT = 1The
parse()
member function will return one of these values.
lex+nop()()
member,
even if the current token has not yet been processed. It is a useful
member when the parser should be reset to its initial state, e.g.,
between successive calls of parse()
. In this situation the scanner
will probably be reloaded with new information too (in the context of
a flex
-generated scanner by, e.g., calling the scanner's
yyrestart()
member.
lex()
. As it is a member function it has
access to all the parser's members, in particular d_token, the
current token value and d_loc__, the current token location
information (if %lsp-needed, %ltype or %locationstruct has
been specified).
int lex()
private member function is called by the parse()
member to obtain the next lexical token. By default it is not implemented, but
the %scanner
directive (see section 5.6.16) may be used to
pre-implement a standard interface to a lexical analyzer.
The lex()
member function interfaces to the lexical scanner, and it is
expected to return the next token produced by the lexical scanner. This token
may either be a plain character or it may be one of the symbolic tokens
defined in the Parser::Tokens enumeration. Any zero or negative token
value is interpreted as `end of input', causing parse()
to return.
The lex()
member function may be implemented in various ways:
lex()
may itself implement a lexical analyzer (a
scanner). This may actually be a useful option when the input offered to
the program using bisonc++'s parser class is not overly complex. This approach was
used when implementing the earlier examples (see sections 4.1.3 and
4.4.4).
lex()
may call a external function or member function of class
implementing a lexical scanner, and return the information offered by this
external function. When using a class, an object of that class could also be
defined as additional data member of the parser (see the next
alternative). This approach can be followed when generating a lexical scanner
from a lexical scanner generating tool like lex(1) or flex(1). The
latter program allows its users to generate a scanner class.
Scanner d_scanner
and
the parser's lex()
member merely returns d_scanner.yylex()
.
parse()
function or since the last detected syntactic error. It is initialized
to d_requiredTokens__
to allow an early error to be detected as
well.
d_scanner.setSLoc(&d_loc__);Subsequently, the lexical scanner may assign a value to the parser's d_loc__ variable through the pointer to d_loc__ stored inside the lexical scanner.
parse()
. It is initialized by the
parser's base class initializer, and is updated while parse()
executes. When parse()
has returned it contains the total number
of errors counted by parse()
. Errors are not counted if suppressed
(i.e., if d_acceptedTokens__
is less than d_requiredTokens__
).
parse()
function must have processed before a syntactic error can be
generated.
d_scanner.setSval(&d_val__);Subsequently, the lexical scanner may assign a value to the parser's d_val__ variable through the pointer to d_val__ stored inside the lexical scanner.
Note that in some cases this approach must be used to make available the correct semantic value to the parser. In particular, when a grammar state defines multiple reductions, depending on the next token, the reduction's action only takes place following the retrieval of the next token, thus losing the initially matched token text. As an example, consider the following little grammar:
expr: name | ident '(' ')' | NR ; name: IDENT ; ident: IDENT ;Having recognized
IDENT
two reductions are possible: to name
and to ident
. The reduction to ident
is appropriate when the
next token is (
, otherwise the reduction to name
will
be performed. So, the parser asks for the next token, thereby
destroying the text matching IDENT
before ident
or name
's
actions are able to save the text themselves. To enure the
availability of the text matching IDENT
is situations like these
the scanner must assign the proper semantic value when it
recognizes a token. Consequently the parser's d_val__
data member
must be made available to the scanner.
parse()
function (see section
5.6.22.5) the following types and variables are defined in the
anonymous namespace. These are mentioned here for the sake of completeness,
and are not normally accessible to other parts of the parser.
PARSE_ACCEPT = 0, _UNDETERMINED_ = -2, _EOF_ = -1, _error_ = 256,These tokens are used by the parser to determine whether another token should be requested from the lexical scanner, and to handle error-conditions.
NORMAL, ERR_ITEM, REQ_TOKEN, ERR_REQ, // ERR_ITEM | REQ_TOKEN DEF_RED, // state having default reduction ERR_DEF, // ERR_ITEM | DEF_RED REQ_DEF, // REQ_TOKEN | DEF_RED ERR_REQ_DEF // ERR_ITEM | REQ_TOKEN | DEF_REDThese tokens are used by the parser to define the types of the various states of the analyzed grammar.
<nr>
,
where tt<nr> is a state number are defined in the anonymous namespace
as well. The SR__ elements consist of two unions,
defining fields that are applicable to, respectively, the first,
intermediate and the last array elements.<nr>
is a numerical value representing a state number.
Used internally.
$$
: This acts like a variable that contains the semantic value for
the grouping made by the current rule. See section 5.5.4.
$n
: This acts like a variable that contains the semantic value for
the n-th component of the current rule. See section 5.5.4.
$<typealt>$
: This is like $$
, but it specifies alternative
typealt
in the union specified by the %union
directive. See sections
5.5.1 and 5.5.2.
$<typealt>n
: This is like $n
but it specifies an alternative
typealt
in the union specified by the %union
directive. See sections
5.5.1 and 5.5.2.
@n
: This acts like a structure variable containing information on the
line numbers and column numbers of the nth component of the current rule. The
default structure is defined like this (see section 5.6.10):
struct LTYPE__ { int timestamp; int first_line; int first_column; int last_line; int last_column; char *text; };Thus, to get the starting line number of the third component, you would use
@3.first_line
.
In order for the members of this structure to contain valid information, you must make sure the lexical scanner supplies this information about each token. If you need only certain fields, then the lexical scanner only has to provide those fields.
Be advised that using this or corresponding (custom-defined, see sections 5.6.11 and 5.6.9) may slow down the parsing process noticeably.