Return to David's Python ResourcesPyBison - Python-based Parsing at the Speed of C
Getting PyBison
License: GPL
(but we will consider applications for licenses for
commercial non-open-source deployment)
Introduction
PyBison is a Python binding to the Bison (yacc) and Flex (lex) parser-generator
utilities.
It allows parsers to be quickly and easily developed as Python class
declarations, and for these parsers to take advantage of the fast and
powerful C-based Bison/Flex.
Users write a subclass of a basic Parser object, containing a set of methods
and attributes specifying the grammar and lexical analysis rules, and taking
callbacks for providing parser input, and receiving parser target events.
Presently, PyBison is only working on Linux (and possibly *BSD-based) systems.
However, in time, (or if someone volunteers to help out with probably 2 hours'
coding for a small shim layer) it's very possible PyBison will work on Windows
as well.
Features
- Runs at near the speed of C-based parsers, due to direct hooks into bison-generated C code
- Full LALR(1) grammar support
- Includes a utility to convert your legacy grammar (.y) and scanner (.l) scripts into
python modules compatible with PyBison
- Easy to understand - the walkthrough and the examples will have you writing your
own parsers in minutes
- Comfortable and intuitive callback mechanisms
- Can export parse tree to XML with a simple method call
(New!)
- Can reconstitute a parse tree from XML (New!)
- Examples include working parsers for the languages:
Comparison to Other Python Parsers
This comparison is probably very biased, since it's written by the author
of PyBison. However, it should help you to decide whether PyBison is for you.
All the other Python-based parser-construction toolkits I've seen work
in pure Python. While this offers conveniences such as not requiring binary compilation,
and eliminating dependencies on third-party libraries and other software, it can
incur a savage performance penalty.
I've seen some Python parser frameworks which use an idiosyncratic syntax, which I
couldn't (or wouldn't) comfortably relate to, especially since I have a background
of developing large yacc-based packages. In particular, I wanted to build my
parser in genuine Python source files, rather than embedding Python code into
a different script language.
On the other hand, I found the PLY parser framework to be much more comfortably
Pythonic, in that targets are mapped to class methods. However, I ran into a
couple of problems with PLY, namely:
- speed - PLY could be very slow with generating the parse tables, and
with parsing its input
- capacity - due to its use of named-match regular expressions (and the
underlying Python limitation), the lexical analysis is limited to a
vocabulary of 100 unique token types - insufficient for large modern
languages.
- model - PLY is limited to SLR parsing, whereas Bison does full LALR(1)
- streaming - PLY's lexer requires the full input string to be made
available in one chunk, making it unsuitable for parsing streams of
unpredictable length, whereas Bison and Flex can work from a continuous
stream.
With PyBison, I've opted for a system which:
- Extracts grammar and lexical analysis information from user-written
parser classes
- Generates bison and flex sources, and converts these into C sources
- Compiles these into a shared loadable library, which is re-usable with
subsequent parsing runs (and which gets automatically rebuilt if (by
virtue of hashing tests) any of the grammar, precedence, tokens or
lexing rules change
- Uses Pyrex to interface with this library, calling its yyparse() routine
and taking callbacks for input requests and target fulfilment events
The result is a parser toolkit with a Python front end and Python's luxurious
comfort, ease of use, but with (most of) the speed and power of traditional
bison/yacc-based parsers.