A modern, open-source data platform built entirely with Python tools, demonstrating a complete end-to-end data pipeline for Star Wars film data.
- Data Ingestion: dlt for extracting data from the Star Wars API
- Data Warehouse: DuckDB for fast, embedded analytics
- Data Transformation: dbt for SQL-based data modelling
- Data Orchestration: Dagster for pipeline management
- Data Visualization: Streamlit for interactive dashboards
- Python 3.8+
- just command runner (optional)
- Clone the repository:
git clone https://github.com/your-username/pystack.git cd pystack- Create and activate a virtual environment:
uv syncjust bi # Or manually:cd src && streamlit run visualisation/app.pyjust orchestrate # Or manually: dagster dev -f src/orchestration/definitions.pyjust duck # Or manually: duckdb src/pystack.duckdbjust dbt-docspystack/ ├── src/ │ ├── orchestration/ # Dagster pipeline definitions │ ├── transformation/ # dbt models and configurations │ └── visualisation/ # Streamlit dashboard ├── justfile # Command shortcuts └── README.md - Financials: View film budgets, box office revenue, and ROI
- Attributes: Analyze species, characters, planets, and starships per film
This project is open source and available under the MIT License.
- Star Wars API (SWAPI) for providing the data
- PyConDE 2025 for inspiration
Built using Python for PyCon DE and PyData 2025