FauxFoundry enables teams to generate unique synthetic datasets from human-readable YAML specifications. It leverages local AI models (e.g., Ollama) to produce realistic, domain-aware data that respects schema constraints while ensuring exactly N unique records are delivered through efficient streaming with minimal validation overhead.
Created by copyleftdev - Building tools for developers, by developers.
- ๐ฏ YAML-Driven: Simple, human-readable specifications
- ๐ค LLM-Powered: Uses local models (Ollama) for realistic data generation
- ๐ Streaming: Constant memory usage, handles large datasets efficiently
- ๐จ Rich TUI: Interactive terminal interface for guided workflows
- โก CLI-First: Automation-friendly command-line interface
- ๐ Privacy-First: All processing happens locally, no data leaves your machine
- ๐ Real-time Monitoring: Live progress tracking and statistics
- โ Validation: Built-in specification validation and error handling
- ๐ฅ Healthcare Ready: EDI, FHIR, HL7, and medical claims support
- ๐ Intelligent Retry: Advanced timeout handling with adaptive strategies
- ๐ฒ Deduplication: Ensures 100% unique records with canonical hashing
- ๐ Production Scale: Generate millions of records with constant memory usage
- Go 1.21 or later
- Ollama running locally with a model (e.g.,
llama3.1:8b)
# Clone the repository git clone https://github.com/copyleftdev/faux-foundry cd faux-foundry # Build the binary go build -o bin/fauxfoundry ./cmd/fauxfoundry # Or install directly go install ./cmd/fauxfoundry # Check installation ./bin/fauxfoundry --version- Create a specification:
fauxfoundry init customer.yaml --template ecommerce- Validate the specification:
fauxfoundry validate customer.yaml- Generate synthetic data:
fauxfoundry generate --spec customer.yaml --output outputs/data.jsonl- Launch interactive TUI:
fauxfoundry tuiFauxFoundry uses YAML specifications to define your data generation requirements:
model: endpoint: "http://localhost:11434"name: "llama3.1:8b"batch_size: 32temperature: 0.7dataset: count: 1000domain: "E-commerce customer data"fields: - name: "email"type: "email"required: truepattern: "@(gmail|yahoo|outlook)\\.com$" - name: "age"type: "integer"required: truerange: [18, 80] - name: "status"type: "enum"required: truevalues: ["active", "inactive", "pending"] - name: "created_at"type: "datetime"required: truedescription: "Account creation date" - name: "preferences"type: "object"description: "Customer preferences and settings"string- Text stringstext- Longer text contentinteger- Whole numbersfloat- Decimal numbersboolean- True/false valuesdatetime- ISO 8601 timestampsdate- Date valuestime- Time valuesemail- Email addressesurl- URLsuuid- UUID valuesphone- Phone numbersenum- Predefined valuesobject- Nested objectsarray- Arrays of values
required- Field must be presentpattern- Regex pattern for validationrange- Min/max values for numbersvalues- Allowed values for enumsdescription- Field description for LLM context
Generate synthetic data from YAML specifications with advanced options:
# Basic generation fauxfoundry generate --spec customer.yaml # Override count and specify output fauxfoundry generate --spec customer.yaml --count 5000 --output outputs/data.jsonl.gz # Dry run validation fauxfoundry generate --spec customer.yaml --dry-run # Interactive mode fauxfoundry generate --interactive # Advanced timeout handling fauxfoundry generate --spec complex-edi.yaml --max-retries 5 --min-batch-size 1 # Custom timeout and seed fauxfoundry generate --spec customer.yaml --timeout 30m --seed 12345Flags:
-s, --spec string- Path to YAML specification file (required)-o, --output string- Output file path (stdout if not specified)-n, --count int- Override record count from specification-t, --timeout string- Maximum execution time (default "2h")--seed int- Random seed for reproducibility--dry-run- Validate specification without generating data-i, --interactive- Launch interactive TUI mode--max-retries int- Maximum retry attempts on timeout (default 3)--min-batch-size int- Minimum batch size before giving up (default 1)
Validate YAML specifications for syntax and semantic correctness:
# Validate single file fauxfoundry validate customer.yaml # Validate multiple files fauxfoundry validate *.yaml # Verbose validation with detailed output fauxfoundry validate customer.yaml --verbose # Quiet validation (errors only) fauxfoundry validate customer.yaml --quietFlags:
--dry-run- Same as validate (included for consistency)-v, --verbose- Enable detailed validation output-q, --quiet- Show only errors
Create new YAML specifications from templates or interactively:
# Interactive creation fauxfoundry init customer.yaml # From template fauxfoundry init --template ecommerce customer.yaml # Available templates fauxfoundry init --list-templates # Force overwrite existing file fauxfoundry init --force customer.yaml --template medicalAvailable Templates:
ecommerce- E-commerce customer datauser- User profiles and authenticationproduct- Product catalog with pricingmedical- Healthcare and medical recordsfinancial- Financial transactions and accounts
Flags:
--template string- Use predefined template--list-templates- Show available templates--force- Overwrite existing files
Launch the rich Terminal User Interface for guided workflows:
# Launch TUI fauxfoundry tui # Launch with specific spec fauxfoundry tui --spec customer.yaml # Launch in specific mode fauxfoundry tui --mode generateFlags:
--spec string- Load specific specification file--mode string- Start in specific mode (browse, edit, generate, monitor)
Diagnose system health and Ollama connectivity:
# Full system check fauxfoundry doctor # Check specific endpoint fauxfoundry doctor --endpoint http://localhost:11434 # Verbose diagnostics fauxfoundry doctor --verboseFlags:
--endpoint string- Ollama endpoint to check--fix- Attempt to fix common issues--models- List available models
The TUI provides a rich, interactive experience with:
- Specification Editor: Visual YAML editing with validation
- Generation Monitor: Real-time progress and statistics
- File Browser: Manage specifications and outputs
- Settings Panel: Configure models and preferences
F1- HelpF2- Specification BrowserF3- Generate DataF4- Monitor GenerationF10- QuitCtrl+N- New SpecificationCtrl+S- SaveTab/Shift+Tab- Navigate components
--config- Configuration file path--verbose- Enable verbose logging--quiet- Suppress non-essential output--no-color- Disable colored output
Configure your LLM backend in the specification:
model: endpoint: "http://localhost:11434"# Ollama endpointname: "llama3.1:8b"# Model namebatch_size: 32# Records per batchtemperature: 0.7# Creativity (0-2)timeout: "30s"# Request timeoutFauxFoundry generates data in JSON Lines (JSONL) format:
{"email": "[email protected]", "age": 34, "status": "active", "created_at": "2023-05-15T10:30:00Z", "preferences":{"newsletter": true}}{"email": "[email protected]", "age": 28, "status": "pending", "created_at": "2023-06-20T14:45:00Z", "preferences":{"newsletter": false}}Output can be:
- Streamed to stdout
- Saved to files (
.jsonlor.jsonl.gz) - Piped to other tools (
jq, databases, etc.)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ FauxFoundry Interface โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค โ CLI Layer โ TUI Layer โ Shared Core โ โ โโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโ โ โ โ Cobra CLI โ โ โ Bubble Tea โ โ โ Spec Parser โ โ โ โ Commands โ โ โ Components โ โ โ LLM Client โ โ โ โ Flags โ โ โ Views โ โ โ Dedup Logic โ โ โ โ Validation โ โ โ Models โ โ โ Output โ โ โ โโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ faux-foundry/ โโโ cmd/fauxfoundry/ # Main application entry point โโโ internal/ # Internal packages โ โโโ cli/ # CLI commands and logic โ โโโ tui/ # Terminal UI components โ โโโ llm/ # LLM client and Ollama integration โ โโโ spec/ # YAML specification parsing โ โโโ dedup/ # Record deduplication logic โ โโโ output/ # Output writers (JSONL, compression) โโโ pkg/types/ # Shared type definitions โโโ examples/ # Sample YAML specifications โโโ outputs/ # Generated data files (gitignored) โโโ docs/ # Documentation (PRD, design specs) FauxFoundry includes comprehensive example specifications for various domains:
customer.yaml- E-commerce customer data with demographicsproduct.yaml- Product catalog with pricing and inventoryuser.yaml- User profiles and authentication data
medical-demo.yaml- Basic medical insurance verificationmedical-insurance.yaml- Comprehensive 46-field insurance dataedi-270-271.yaml- EDI X12 healthcare eligibility transactions (53 fields)rx-claims-edi.yaml- NCPDP D.0 pharmacy claims (75+ fields)x12-837-core.yaml- X12 837 Professional Claims (66 fields)
financial-transactions.yaml- Banking and payment dataapi-logs.yaml- Application logs and metricsinventory-management.yaml- Supply chain and logistics
Healthcare Systems:
# Generate 1000 medical insurance records fauxfoundry generate --spec examples/medical-insurance.yaml --count 1000 --output outputs/insurance-test-data.jsonl # Create EDI test transactions fauxfoundry generate --spec examples/edi-270-271.yaml --count 100 --output outputs/edi-test.jsonl.gzDevelopment & Testing:
# Generate customer test data for QA fauxfoundry generate --spec examples/customer.yaml --count 50000 --output outputs/qa-customers.jsonl # Create reproducible test datasets fauxfoundry generate --spec examples/user.yaml --seed 12345 --count 1000Performance Testing:
# Generate large datasets with streaming fauxfoundry generate --spec examples/product.yaml --count 1000000 --output outputs/products.jsonl.gz # Stress test with complex specifications fauxfoundry generate --spec examples/x12-837-core.yaml --count 10000 --max-retries 5We welcome contributions from the community! Here's how to get started:
- Fork the repository on GitHub
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with proper tests and documentation
- Run tests (
go test ./...) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request with a clear description
# Clone your fork git clone https://github.com/yourusername/faux-foundry cd faux-foundry # Install dependencies go mod download # Run tests go test ./... # Build and test locally go build -o bin/fauxfoundry ./cmd/fauxfoundry ./bin/fauxfoundry doctor- Follow Go best practices and
gofmtformatting - Add tests for new functionality
- Update documentation for user-facing changes
- Use conventional commit messages
This project is licensed under the MIT License - see the LICENSE file for details.
FauxFoundry is committed to being a truly open-source project:
- โ No vendor lock-in or proprietary dependencies
- โ Local-first processing (your data never leaves your machine)
- โ Community-driven development and feature requests
- โ Transparent development process
Created by copyleftdev with โค๏ธ for the developer community.
- Ollama - Local LLM infrastructure and model management
- Cobra - Powerful CLI framework for Go
- Bubble Tea - Terminal UI framework
- Lip Gloss - Terminal styling and layout
- Go - Systems programming language
- ANSI X12 - EDI transaction standards for healthcare
- NCPDP - Pharmacy claims processing standards
- HL7 FHIR - Healthcare interoperability standards
- ICD-10 - International disease classification
- CPT - Current Procedural Terminology codes
Special thanks to the open-source community and all contributors who help make FauxFoundry better!
# Quick start - generate your first synthetic dataset git clone https://github.com/copyleftdev/faux-foundry cd faux-foundry go build -o bin/fauxfoundry ./cmd/fauxfoundry ./bin/fauxfoundry init my-data.yaml --template ecommerce ./bin/fauxfoundry generate --spec my-data.yaml --count 100FauxFoundry - Generate synthetic data with confidence ๐ฏ
Built by developers, for developers. Privacy-first. Open source. Production ready.
