bashlex - Python parser for bash

bashlex is a Python port of the parser used internally by GNU bash.

For the most part it's transliterated from C, the major differences are:

it does not execute anything
it is reentrant
it generates a complete AST

Installation:

$ pip install bashlex

Usage

$ python >>> import bashlex >>> parts = bashlex.parse('true && cat <(echo $(echo foo))') >>> for ast in parts: ... print ast.dump() ListNode(pos=(0, 31), parts=[ CommandNode(pos=(0, 4), parts=[ WordNode(pos=(0, 4), word='true'), ]), OperatorNode(op='&&', pos=(5, 7)), CommandNode(pos=(8, 31), parts=[ WordNode(pos=(8, 11), word='cat'), WordNode(pos=(12, 31), word='<(echo $(echo foo))', parts=[ ProcesssubstitutionNode(command= CommandNode(pos=(14, 30), parts=[ WordNode(pos=(14, 18), word='echo'), WordNode(pos=(19, 30), word='$(echo foo)', parts=[ CommandsubstitutionNode(command= CommandNode(pos=(21, 29), parts=[ WordNode(pos=(21, 25), word='echo'), WordNode(pos=(26, 29), word='foo'), ]), pos=(19, 30)), ]), ]), pos=(12, 31)), ]), ]), ])

It is also possible to only use the tokenizer and get similar behaviour to shlex.split, but bashlex understands more complex constructs such as command and process substitutions:

>>> list(bashlex.split('cat <(echo "a $(echo b)") | tee')) ['cat', '<(echo "a $(echo b)")', '|', 'tee']

..compared to shlex:

>>> shlex.split('cat <(echo "a $(echo b)") | tee') ['cat', '<(echo', 'a $(echo b))', '|', 'tee']

The examples/ directory contains a sample script that demonstrate how to traverse the ast to do more complicated things.

Limitations

Currently the parser has no support for:

arithmetic expressions $((..))
the more complicated parameter expansions such as ${parameter#word} are taken literally and do not produce child nodes

Debugging

It can be useful to debug bashlex in conjunction to GNU bash, since it's mostly a transliteration. Comments in the code sometimes contain line references to bash's source code, e.g. # bash/parse.y L2626.

$ git clone git://git.sv.gnu.org/bash.git $ cd bash $ git checkout df2c55de9c87c2ee8904280d26e80f5c48dd6434 # commit used in translating the code $ ./configure $ make CFLAGS=-g CFLAGS_FOR_BUILD=-g # debug info and don't optimize $ gdb --args ./bash -c 'echo foo'

Useful things to look at when debugging bash:

variables yylval, shell_input_line, shell_input_line_index
breakpoint at yylex (token numbers to names is in file parser-built)
breakpoint at read_token_word (corresponds to bashlex/tokenizer._readtokenword)
xparse_dolparen, expand_word_internal (called when parsing $())

Motivation

I wrote this library for another project of mine, explainshell which needed a new parsing backend to support complex constructs such as process/command substitutions.

Releasing a new version

Suggestion for making a release environment:

python3 -m venv venv source venv/bin/activate pip install -r requirements.txt

make tests
bump version in setup.py
git tag the new commit
run python -m build
run twine upload dist/*

License

The license for this is the same as that used by GNU bash, GNU GPL v3+.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github/workflows		.github/workflows
bashlex		bashlex
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

bashlex - Python parser for bash

Installation:

Usage

Limitations

Debugging

Motivation

Releasing a new version

License

About

Uh oh!

Releases

Packages

Used by 643

Contributors 16

Languages

License

idank/bashlex

Folders and files

Latest commit

History

Repository files navigation

bashlex - Python parser for bash

Installation:

Usage

Limitations

Debugging

Motivation

Releasing a new version

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Used by 643

Contributors 16

Languages

Packages