Tutorial

This section describes how to set up a fully-functional project using Speedy Antlr. You can find the completed example here.

In this example, a we will put together a fictional Python module called spam that implements an Antlr parser for a grammar called MyGrammar.

antlr4-cpp-runtime

src/spam/parser/cpp_src/antlr4-cpp-runtime

This directory contains a copy of Antlr’s C++ runtime source. This is required for our extension to be built against.

Future releases of the runtime can be downloaded from Antlr’s download page

generate_parsers.sh

src/spam/parser/generate_parsers.sh

This script is what the developer uses to re-generate Antlr targets, as well as the Speedy Antlr accelerator files.

As usual, generate targets from your grammar file using Antlr. We want to generate both Python and C++ targets since both will be used together.

After the targets are generated, invoke speedy-antlr-tool via Python to generate the accelerator files.

Note

The optional entry_rule_names option allows you to provide a reduced list of parse tree entry points. This is a list of context names the parser will support when calling the parse() function from Python. Providing a reduced list can simplify the output and remove unnecessary code from the parse accelerator.

If this option is omitted, support for all entry rules is generated.

You’ll notice this last step generates the following files:

sa_mygrammar.py
cpp_src/sa_mygrammar_cpp_parser.cpp
cpp_src/sa_mygrammar_translator.cpp/.h
cpp_src/speedy_antlr.cpp/.h

Note

If your language grammar is split into separate Lexer and Parser files, see the alternate src/spam/parser/generate_parsers_split.sh example script.

sa_mygrammar.py

src/spam/parser/sa_mygrammar.py

This module provides the entry-point for the C++ based parser, as well as a pure Python fall-back implementation. When calling the parse() function, the fall-back implementation is automatically used if the C++ version failed to install.

print_tree.py

src/spam/print_tree.py

Now let’s use the resulting parser.

You can use the following boolean flag to detect whether the C++ accelerator extension will be used. Using this flag is not necessary, but it can be overridden to False if you want to force it to use the fall-back Python parser.

In some applications it is useful to intercept lex/parse syntax errors. The Antlr runtime provides a mechanism to do so via the ErrorListener class. Unfortunately since it is not practical to translate all Antlr C++ objects back to Python, the usual error listener can not be used. Instead, an equivalent SA_ErrorListener is provided that provides a very similar interface.

Using the error listener is totally optional. If it is omitted, Antlr’s default ConsoleErrorListener is used.

For this example, let’s define a pretty verbose listener:

And finally put everything together:

setup.py

setup.py

This example setup script shows how to gracefully omit the C++ accelerator if it fails to build. Recall from earlier, if the extension is not available, the parse() wrapper function will automatically choose the Python equivalent.

LICENSE-3RD-PARTY

LICENSE-3RD-PARTY

Since you’ll be bundling the Antlr C++ runtime in your package’s distribution (source and binary), be a good steward of open-source software and include a copy of Antlr’s BSD license.

.github/workflows/build.yml

.github/workflows/build.yml

If you’ve attempted to install this example by now, you’ve probably noticed that it takes a looong time. This is because all the C++ files (antlr has many) are getting compiled.

If you plan to publish your package to PyPi, it is good practice to also publish binary distributions. This eliminates the need for the end-user to install a compiler and build everything from source.

Since you probably don’t have access to every variant of Windows/Linux/macOS, this is typically done using a continuous integration service like Github Actions. This YAML file tells Github Actions how to run your project’s tests, and how to deploy to PyPi. I’m also using cibuildwheel to automate building all the different distribution variants.