A Python 3.13 AST parser and code generator written in pure Swift. Parse Python code without requiring a Python runtime!
PySwiftAST provides a comprehensive toolkit for parsing and generating Python code in Swift. It consists of:
- Tokenizer - Complete lexical analysis with full Python 3.13 token support
- Parser - Recursive descent parser handling complex real-world Python code
- AST Nodes - Complete Swift types matching Python's
astmodule (28 modular files) - Code Generator - Generate Python source code from AST
A pure Swift implementation offers:
- Speed: No Python interpreter overhead, native performance
- Portability: Works anywhere Swift runs (macOS, Linux, iOS, etc.)
- Integration: Native Swift types and error handling
- Tooling: Use with Swift projects directly, great for IDEs and tools
PySwiftAST uses a hand-written recursive descent parser that efficiently handles Python 3.13 syntax:
Python Source → Tokenizer → Parser → Complete AST → Code Generator → Python Source This approach provides:
- ✅ Performance: Efficient parsing with operator precedence climbing
- ✅ Maintainability: Clear, readable Swift code
- ✅ No Dependencies: Pure Swift, no Python runtime required
- ✅ Round-Trip: Parse and regenerate Python code
The implementation consists of:
Token.swift - All Python 3.13 token types (130+ tokens)
Tokenizer.swift - Complete lexical analysis with:
- Indentation-aware tokenization (INDENT/DEDENT)
- All string literal types (raw, f-strings, triple-quoted, bytes)
- All number formats (int, float, complex, hex, octal, binary, scientific)
- All Python operators and keywords
- Comments and type comments
- Proper line/column tracking
AST/ (28 files) - Complete AST node definitions:
- All statement types (if, for, while, def, class, try, with, match, etc.)
- All expression types (BinOp, Call, Lambda, comprehensions, etc.)
- Pattern matching (Python 3.10+)
- Type parameters (Python 3.12+)
- TreeDisplayable protocol for visualization
Parser.swift (3,200+ lines) - Recursive descent parser implementing:
- All statements (assignments, control flow, functions, classes, etc.)
- All expressions with proper operator precedence
- Comprehensions (list, dict, set, generator)
- Pattern matching
- Type annotations
- F-strings with embedded expressions and concatenation
- Complex real-world Python constructs
- Error recovery and reporting
CodeGen.swift - Python code generator with:
- AST to source code conversion
- Proper indentation and formatting
- Round-trip support (parse → generate → parse)
import PySwiftAST // Parse Python code with full feature support letsource="""def greet(name: str, age: int = 0) -> str: return f"Hello,{name}, age{age}!"class Dog(Animal): def bark(self): print("Woof!")# Ternary operatorresult = 10 if x > 5 else 20# Pattern matchingmatch value: case [x, y] if x > 0: print(f"Positive:{x},{y}") case _: print("Other")greet("World")"""letmodule=tryparsePython(source)print(module.display()) // Beautiful tree visualization // Or just tokenize lettokens=trytokenizePython(source)fortokenin tokens {print(token.type)}- ✅ Variables, assignments, type annotations
- ✅ All operators (arithmetic, comparison, logical, bitwise)
- ✅ Assignment target validation
- ✅ Walrus operator (
:=) - ✅ Augmented assignments (
+=,-=, etc.)
- ✅ If/elif/else statements
- ✅ If-expressions (ternary:
x if cond else y) - ✅ For/while loops with else
- ✅ Break, continue, pass
- ✅ Match/case statements (Python 3.10+)
- ✅ Pattern matching with guards
- ✅ Function definitions with decorators
- ✅ Async functions
- ✅ Lambda expressions
- ✅ Type annotations (parameters, return types)
- ✅ Default parameters
- ✅
*argsand**kwargs - ✅ Positional-only (
/) and keyword-only (*) parameters - ✅ Yield and yield from
- ✅ Class definitions with decorators
- ✅ Inheritance (single and multiple)
- ✅ Metaclass specification
- ✅ Methods and attributes
- ✅ Lists, tuples, dictionaries, sets
- ✅ List/dict/set comprehensions with conditions
- ✅ Generator expressions
- ✅ Starred expressions in comprehensions (
for *args, item in items) - ✅ Subscripting and slicing
- ✅ Integers (decimal, hex
0xFF, binary0b1010, octal0o777) - ✅ Floats, scientific notation (
1.5e10) - ✅ Complex numbers (
1+2j) - ✅ Strings (all quote styles, raw, bytes)
- ✅ F-strings with embedded expressions and concatenation
- ✅ None, True, False
- ✅ Ellipsis (
...)
- ✅ Exception handling (try/except/finally/else)
- ✅ Context managers (with statements)
- ✅ Async/await (async def, await, async for, async with)
- ✅ Import statements (all forms, including dotted:
import urllib.request) - ✅ Global/nonlocal declarations
- ✅ Del statements
- ✅ Assert and raise
- ✅ Implicit tuple returns (
return a, b, c) - ✅ Comments in expressions and blocks
See FEATURES.md for a comprehensive feature list.
A pure Swift implementation offers:
- Speed: No Python interpreter overhead, native performance
- Portability: Works anywhere Swift runs (macOS, Linux, iOS, etc.)
- Integration: Native Swift types and error handling
- Tooling: Use with Swift projects directly, great for IDEs and tools
PySwiftAST successfully parses complex real-world Python code:
- Django query.py (2,886 lines, 111 KB) - Django ORM query module, full parse + round-trip
- Data Pipeline (311 lines, 1,994 tokens) - Complex data processing with pandas
- Web Framework (412 lines, 2,515 tokens) - FastAPI-style web framework
- ML Pipeline (482 lines, 3,112 tokens) - Machine learning with PyTorch patterns
- Pattern Matching (480 lines, 2,794 tokens) - Comprehensive match/case examples
PySwiftAST is significantly faster than Python's built-in ast module for round-trip operations:
python3 benchmark_vs_python.pyBenchmark Results (ML Pipeline, 482 lines, 14.5 KB):
| Metric | Python | PySwiftAST | Speedup |
|---|---|---|---|
| Tokenization | 1.54 ms | 0.28 ms | 5.4x faster ✨ |
| Parsing | 1.77 ms | 1.63 ms | 1.1x faster ✅ |
| Round-Trip | 6.34 ms | 2.22 ms | 2.85x faster 🚀 |
Key Optimizations:
- ✨ UTF-8 Tokenizer: 5.4x faster than Python's tokenize module
- ✨ Expression Fast Path: Bypasses precedence chain for simple expressions
- ✨ Precomputed Indentation: Avoids repeated string allocation in code generation
- ✨ Inlined Hot Functions: Eliminates call overhead in critical paths
Round-Trip Performance (parse → generate → reparse):
- 2.85x faster than Python - exceeds 1.5x target by 90%! 🎉
- Validates code generation correctness at scale
- Consistent performance with low variance (±7%)
Benchmark: 100 iterations, release build (-c release), macOS. See OPTIMIZATION_SUMMARY.md for detailed analysis.
Comprehensive profiling identified and optimized key bottlenecks:
- Tokenization (12% of pipeline) - 5.4x speedup via UTF-8 byte processing
- Parsing (55% of pipeline) - Optimized with expression fast paths and inlining
- Code Generation (33% of pipeline) - Precomputed indentation strings
See PROFILING_RESULTS.md and OPTIMIZATION_SUMMARY.md for complete performance analysis and optimization techniques.
swift test72 tests, all passing (100% success rate) 🎉
1. Core Functionality (7 tests)
- ✅ Tokenizer with indentation tracking
- ✅ Simple assignments and expressions
- ✅ Function definitions
- ✅ Control structures
- ✅ Multiple statements
- ✅ Indentation validation
- ✅ Dotted module imports (urllib.request, xml.etree.ElementTree)
2. Python Feature Coverage (50 tests) Real-world Python files covering every feature:
- ✅ Functions (def, async def, decorators, type hints, f-strings)
- ✅ Classes (inheritance, metaclass, methods)
- ✅ Control flow (if/elif/else, for, while, match/case)
- ✅ Imports (all forms, including dotted modules)
- ✅ Exceptions (try/except/finally/else)
- ✅ Context managers (with, async with)
- ✅ Comprehensions (list, dict, set, generator)
- ✅ Async/await (async def, await, async for)
- ✅ Lambdas and closures
- ✅ Pattern matching (comprehensive)
- ✅ Type annotations
- ✅ Decorators
- ✅ F-strings with embedded expressions
- ✅ All operators
- ✅ All collections
- ✅ Complex real-world examples
3. Syntax Error Detection (10 tests) Validates proper error reporting:
- ✅ Missing colons
- ✅ Invalid indentation
- ✅ Unclosed strings
- ✅ Mismatched parentheses
- ✅ Invalid assignment targets
- ✅ Unexpected indents/dedents
- ✅ Unexpected tokens
- ✅ Multiple errors with clear messages
# Run all tests swift test# Run specific test swift test --filter testPatternMatching # Verbose output swift test2>&1| lessPotential enhancements include:
- Performance Optimization - Benchmark and optimize hot paths
- Visitor Pattern - AST traversal and transformation utilities
- Error Recovery - Better error messages, suggest fixes
- Source Maps - Preserve exact formatting information
- LSP Support - Language Server Protocol integration
- Additional Testing - More edge cases and Python constructs
- Documentation - More examples and use cases
PySwiftAST/ ├── Sources/PySwiftAST/ │ ├── Token.swift (130+ token types) │ ├── Tokenizer.swift (533 lines) │ ├── Parser.swift (2,904 lines) │ ├── PySwiftAST.swift (Public API) │ └── AST/ (28 files) │ ├── Module.swift │ ├── Statement.swift │ ├── Expression.swift │ ├── Statements/ (9 files) │ ├── Expressions/ (9 files) │ └── Supporting/ (7 files) ├── Tests/ │ └── PySwiftASTTests/ │ ├── PySwiftASTTests.swift │ └── Resources/ │ ├── test_files/ (24 Python test files) │ └── syntax_errors/ (10 error test files) ├── Package.swift ├── README.md └── FEATURES.md (Complete feature list) Contributions are welcome! Areas for enhancement:
- Performance optimizations
- Additional visitor utilities
- More test cases
- Documentation improvements
- Example tools using the parser
MIT License
Inspired by:
- Ruff - Fast Python linter in Rust
- CPython - Python's AST module
- Tree-sitter Python - Incremental parser
Built with ❤️ in Swift for the Python community.