Skip to content

Py-Swift/PySwiftAST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Repository files navigation

PySwiftAST

A Python 3.13 AST parser and code generator written in pure Swift. Parse Python code without requiring a Python runtime!

Overview

PySwiftAST provides a comprehensive toolkit for parsing and generating Python code in Swift. It consists of:

  1. Tokenizer - Complete lexical analysis with full Python 3.13 token support
  2. Parser - Recursive descent parser handling complex real-world Python code
  3. AST Nodes - Complete Swift types matching Python's ast module (28 modular files)
  4. Code Generator - Generate Python source code from AST

Why Pure Swift?

A pure Swift implementation offers:

  • Speed: No Python interpreter overhead, native performance
  • Portability: Works anywhere Swift runs (macOS, Linux, iOS, etc.)
  • Integration: Native Swift types and error handling
  • Tooling: Use with Swift projects directly, great for IDEs and tools

Architecture

Hand-Written Recursive Descent Parser

PySwiftAST uses a hand-written recursive descent parser that efficiently handles Python 3.13 syntax:

Python Source → Tokenizer → Parser → Complete AST → Code Generator → Python Source 

This approach provides:

  • Performance: Efficient parsing with operator precedence climbing
  • Maintainability: Clear, readable Swift code
  • No Dependencies: Pure Swift, no Python runtime required
  • Round-Trip: Parse and regenerate Python code

Implementation Details

The implementation consists of:

  1. Token.swift - All Python 3.13 token types (130+ tokens)

  2. Tokenizer.swift - Complete lexical analysis with:

    • Indentation-aware tokenization (INDENT/DEDENT)
    • All string literal types (raw, f-strings, triple-quoted, bytes)
    • All number formats (int, float, complex, hex, octal, binary, scientific)
    • All Python operators and keywords
    • Comments and type comments
    • Proper line/column tracking
  3. AST/ (28 files) - Complete AST node definitions:

    • All statement types (if, for, while, def, class, try, with, match, etc.)
    • All expression types (BinOp, Call, Lambda, comprehensions, etc.)
    • Pattern matching (Python 3.10+)
    • Type parameters (Python 3.12+)
    • TreeDisplayable protocol for visualization
  4. Parser.swift (3,200+ lines) - Recursive descent parser implementing:

    • All statements (assignments, control flow, functions, classes, etc.)
    • All expressions with proper operator precedence
    • Comprehensions (list, dict, set, generator)
    • Pattern matching
    • Type annotations
    • F-strings with embedded expressions and concatenation
    • Complex real-world Python constructs
    • Error recovery and reporting
  5. CodeGen.swift - Python code generator with:

    • AST to source code conversion
    • Proper indentation and formatting
    • Round-trip support (parse → generate → parse)

Usage

import PySwiftAST // Parse Python code with full feature support letsource="""def greet(name: str, age: int = 0) -> str: return f"Hello,{name}, age{age}!"class Dog(Animal): def bark(self): print("Woof!")# Ternary operatorresult = 10 if x > 5 else 20# Pattern matchingmatch value: case [x, y] if x > 0: print(f"Positive:{x},{y}") case _: print("Other")greet("World")"""letmodule=tryparsePython(source)print(module.display()) // Beautiful tree visualization // Or just tokenize lettokens=trytokenizePython(source)fortokenin tokens {print(token.type)}

Features

Core Language

  • ✅ Variables, assignments, type annotations
  • ✅ All operators (arithmetic, comparison, logical, bitwise)
  • ✅ Assignment target validation
  • ✅ Walrus operator (:=)
  • ✅ Augmented assignments (+=, -=, etc.)

Control Flow

  • ✅ If/elif/else statements
  • ✅ If-expressions (ternary: x if cond else y)
  • ✅ For/while loops with else
  • ✅ Break, continue, pass
  • ✅ Match/case statements (Python 3.10+)
  • ✅ Pattern matching with guards

Functions

  • ✅ Function definitions with decorators
  • ✅ Async functions
  • ✅ Lambda expressions
  • ✅ Type annotations (parameters, return types)
  • ✅ Default parameters
  • *args and **kwargs
  • ✅ Positional-only (/) and keyword-only (*) parameters
  • ✅ Yield and yield from

Classes

  • ✅ Class definitions with decorators
  • ✅ Inheritance (single and multiple)
  • ✅ Metaclass specification
  • ✅ Methods and attributes

Data Structures

  • ✅ Lists, tuples, dictionaries, sets
  • ✅ List/dict/set comprehensions with conditions
  • ✅ Generator expressions
  • ✅ Starred expressions in comprehensions (for *args, item in items)
  • ✅ Subscripting and slicing

Literals

  • ✅ Integers (decimal, hex 0xFF, binary 0b1010, octal 0o777)
  • ✅ Floats, scientific notation (1.5e10)
  • ✅ Complex numbers (1+2j)
  • ✅ Strings (all quote styles, raw, bytes)
  • ✅ F-strings with embedded expressions and concatenation
  • ✅ None, True, False
  • ✅ Ellipsis (...)

Advanced Features

  • ✅ Exception handling (try/except/finally/else)
  • ✅ Context managers (with statements)
  • ✅ Async/await (async def, await, async for, async with)
  • ✅ Import statements (all forms, including dotted: import urllib.request)
  • ✅ Global/nonlocal declarations
  • ✅ Del statements
  • ✅ Assert and raise
  • ✅ Implicit tuple returns (return a, b, c)
  • ✅ Comments in expressions and blocks

See FEATURES.md for a comprehensive feature list.

Why Pure Swift?

A pure Swift implementation offers:

  • Speed: No Python interpreter overhead, native performance
  • Portability: Works anywhere Swift runs (macOS, Linux, iOS, etc.)
  • Integration: Native Swift types and error handling
  • Tooling: Use with Swift projects directly, great for IDEs and tools

Real-World Testing

PySwiftAST successfully parses complex real-world Python code:

  • Django query.py (2,886 lines, 111 KB) - Django ORM query module, full parse + round-trip
  • Data Pipeline (311 lines, 1,994 tokens) - Complex data processing with pandas
  • Web Framework (412 lines, 2,515 tokens) - FastAPI-style web framework
  • ML Pipeline (482 lines, 3,112 tokens) - Machine learning with PyTorch patterns
  • Pattern Matching (480 lines, 2,794 tokens) - Comprehensive match/case examples

⚡ Performance

PySwiftAST is significantly faster than Python's built-in ast module for round-trip operations:

python3 benchmark_vs_python.py

Benchmark Results (ML Pipeline, 482 lines, 14.5 KB):

MetricPythonPySwiftASTSpeedup
Tokenization1.54 ms0.28 ms5.4x faster
Parsing1.77 ms1.63 ms1.1x faster
Round-Trip6.34 ms2.22 ms2.85x faster 🚀

Key Optimizations:

  • UTF-8 Tokenizer: 5.4x faster than Python's tokenize module
  • Expression Fast Path: Bypasses precedence chain for simple expressions
  • Precomputed Indentation: Avoids repeated string allocation in code generation
  • Inlined Hot Functions: Eliminates call overhead in critical paths

Round-Trip Performance (parse → generate → reparse):

  • 2.85x faster than Python - exceeds 1.5x target by 90%! 🎉
  • Validates code generation correctness at scale
  • Consistent performance with low variance (±7%)

Benchmark: 100 iterations, release build (-c release), macOS. See OPTIMIZATION_SUMMARY.md for detailed analysis.

Performance Deep Dive

Comprehensive profiling identified and optimized key bottlenecks:

  1. Tokenization (12% of pipeline) - 5.4x speedup via UTF-8 byte processing
  2. Parsing (55% of pipeline) - Optimized with expression fast paths and inlining
  3. Code Generation (33% of pipeline) - Precomputed indentation strings

See PROFILING_RESULTS.md and OPTIMIZATION_SUMMARY.md for complete performance analysis and optimization techniques.

Testing

swift test

Test Results

72 tests, all passing (100% success rate) 🎉

Test Categories:

1. Core Functionality (7 tests)

  • ✅ Tokenizer with indentation tracking
  • ✅ Simple assignments and expressions
  • ✅ Function definitions
  • ✅ Control structures
  • ✅ Multiple statements
  • ✅ Indentation validation
  • Dotted module imports (urllib.request, xml.etree.ElementTree)

2. Python Feature Coverage (50 tests) Real-world Python files covering every feature:

  • ✅ Functions (def, async def, decorators, type hints, f-strings)
  • ✅ Classes (inheritance, metaclass, methods)
  • ✅ Control flow (if/elif/else, for, while, match/case)
  • Imports (all forms, including dotted modules)
  • ✅ Exceptions (try/except/finally/else)
  • ✅ Context managers (with, async with)
  • ✅ Comprehensions (list, dict, set, generator)
  • ✅ Async/await (async def, await, async for)
  • ✅ Lambdas and closures
  • ✅ Pattern matching (comprehensive)
  • ✅ Type annotations
  • ✅ Decorators
  • F-strings with embedded expressions
  • ✅ All operators
  • ✅ All collections
  • ✅ Complex real-world examples

3. Syntax Error Detection (10 tests) Validates proper error reporting:

  • ✅ Missing colons
  • ✅ Invalid indentation
  • ✅ Unclosed strings
  • ✅ Mismatched parentheses
  • ✅ Invalid assignment targets
  • ✅ Unexpected indents/dedents
  • ✅ Unexpected tokens
  • ✅ Multiple errors with clear messages

Running Tests

# Run all tests swift test# Run specific test swift test --filter testPatternMatching # Verbose output swift test2>&1| less

🎯 Future Work

Potential enhancements include:

  1. Performance Optimization - Benchmark and optimize hot paths
  2. Visitor Pattern - AST traversal and transformation utilities
  3. Error Recovery - Better error messages, suggest fixes
  4. Source Maps - Preserve exact formatting information
  5. LSP Support - Language Server Protocol integration
  6. Additional Testing - More edge cases and Python constructs
  7. Documentation - More examples and use cases

Project Structure

PySwiftAST/ ├── Sources/PySwiftAST/ │ ├── Token.swift (130+ token types) │ ├── Tokenizer.swift (533 lines) │ ├── Parser.swift (2,904 lines) │ ├── PySwiftAST.swift (Public API) │ └── AST/ (28 files) │ ├── Module.swift │ ├── Statement.swift │ ├── Expression.swift │ ├── Statements/ (9 files) │ ├── Expressions/ (9 files) │ └── Supporting/ (7 files) ├── Tests/ │ └── PySwiftASTTests/ │ ├── PySwiftASTTests.swift │ └── Resources/ │ ├── test_files/ (24 Python test files) │ └── syntax_errors/ (10 error test files) ├── Package.swift ├── README.md └── FEATURES.md (Complete feature list) 

Contributing

Contributions are welcome! Areas for enhancement:

  • Performance optimizations
  • Additional visitor utilities
  • More test cases
  • Documentation improvements
  • Example tools using the parser

License

MIT License

Acknowledgments

Inspired by:

Built with ❤️ in Swift for the Python community.

About

Python3 AST parser written in pure Swift

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published