QuantPolars

A Python package for quantitative finance analysis using Polars, providing blazingly fast tools for data summarization and option pricing.

Installation

pip3 install git+https://github.com/matthewgson/quantpolars.git

Requirements: Python 3.8+, Polars

Data Summary Function (`sm`)

Generate comprehensive summary statistics for all columns in your DataFrame with a single function call. Returns a Polars DataFrame with summary statistics that can be optionally converted to styled GT tables.

Features

Blazingly Fast: Single-pass computation using Polars expressions
Type-Aware: Different statistics based on data type (numeric, date, categorical)
Missing Data: Includes percentage of missing values for each column
Simple API: Returns DataFrame directly, convert to GT styling when needed
Styled Output: Optional Great Tables formatting for beautiful HTML tables
LazyFrame Support: Works with both eager and lazy evaluation

Basic Usage

importpolarsasplfromdatetimeimportdatefromquantpolarsimportsm# Create sample datadf=pl.DataFrame({'revenue': [1000, 2500, 1800, 3200, 2900, None, 2100, 1750], 'profit_margin': [0.15, 0.22, 0.18, 0.25, 0.20, 0.17, 0.19, 0.16], 'transaction_date': [ date(2024, 1, 15), date(2024, 2, 20), date(2024, 3, 10), date(2024, 4, 5), date(2024, 5, 12), date(2024, 6, 8), date(2024, 7, 22), None ], 'customer_segment': ['Enterprise', 'SMB', 'Enterprise', 'SMB', 'Enterprise', 'SMB', 'Enterprise', 'SMB'], 'active': [True, True, False, True, False, True, True, False] }) print("Sample Data:") df

# Generate summary statisticssummary=sm(df) print("Summary Statistics with % Missing:") summary# This is now a Polars DataFrame directly

Output:

shape: (5, 16) ┌──────────────────┬─────────────┬──────┬─────────────┬───┬────────┬────────┬────────┬──────────┐ │ variable ┆ type ┆ nobs ┆ pct_missing ┆ … ┆ p75 ┆ p95 ┆ p99 ┆ n_unique │ │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ i64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ i64 │ ╞══════════════════╪═════════════╪══════╪═════════════╪═══╪════════╪════════╪════════╪══════════╡ │ transaction_date ┆ date ┆ 7 ┆ 12.5 ┆ … ┆ null ┆ null ┆ null ┆ 7 │ │ customer_segment ┆ categorical ┆ 8 ┆ 0.0 ┆ … ┆ null ┆ null ┆ null ┆ 2 │ │ active ┆ categorical ┆ 8 ┆ 0.0 ┆ … ┆ null ┆ null ┆ null ┆ 2 │ │ revenue ┆ numeric ┆ 7 ┆ 12.5 ┆ … ┆ 2900.0 ┆ 3200.0 ┆ 3200.0 ┆ 7 │ │ profit_margin ┆ numeric ┆ 8 ┆ 0.0 ┆ … ┆ 0.2 ┆ 0.25 ┆ 0.25 ┆ 8 │ └──────────────────┴─────────────┴──────┴─────────────┴───┴────────┴────────┴────────┴──────────┘

Column Reference

Column	Description
`variable`	Column name
`type`	Data type category (`numeric`, `date`, `categorical`)
`nobs`	Number of non-null observations
`pct_missing`	Percentage of missing values
`mean`	Mean value (numeric columns only)
`sd`	Standard deviation (numeric columns only)
`min`	Minimum value (numeric and date columns only)
`max`	Maximum value (numeric and date columns only)
`p1-p99`	Percentiles (numeric columns only)
`n_unique`	Number of unique values

Styled Output

For beautiful formatted tables with proper date formatting:

fromquantpolarsimportto_gt# Requires: pip3 install great-tablesstyled_summary=to_gt(summary) # Convert DataFrame to styled GT tablestyled_summary# In Jupyter, displays as formatted HTML table

Rendered Output Example: The .to_gt() method returns a Great Tables (GT) object that renders as a beautifully formatted HTML table in Jupyter notebooks with:

Table Header: "Data Summary Statistics" with subtitle showing variable count
Formatted Numbers: Statistics rounded to 2 decimal places
Percentage Formatting: Missing values shown as percentages (e.g., "12.5%")
Date Formatting: Min/max dates formatted as MM/DD/YYYY (e.g., "1/1/2023")
Professional Styling: Clean borders, alternating row colors, proper alignment
Column Labels: User-friendly names ("Std Dev" instead of "sd", "N Obs" instead of "nobs")

Example of what the styled table displays:

Variable	Type	N Obs	% Missing	Mean	Std Dev	Min	Max	1%	5%	25%	50%	75%	95%	99%	N Unique
transaction_date	date	7	12.5%	—	—	Jan 15, 2024	Jul 22, 2024	—	—	—	—	—	—	—	7
customer_segment	categorical	8	0.0%	—	—	—	—	—	—	—	—	—	—	—	2
active	categorical	8	0.0%	—	—	—	—	—	—	—	—	—	—	—	2
revenue	numeric	7	12.5%	2,225.00	716.02	1,000.00	3,200.00	1,000.00	1,000.00	1,800.00	2,100.00	2,900.00	3,200.00	3,200.00	7
profit_margin	numeric	8	0.0%	0.19	0.03	0.15	0.25	0.15	0.15	0.17	0.19	0.22	0.25	0.25	8

Data Type Handling

Numeric: Full statistics including percentiles
Date: Min/max dates only (percentiles not supported by Polars)
Categorical: Unique counts only

Out-of-Core Example

importpolarsasplimportquantpolarsasqp# Batch price 1M optionsdf=pl.scan_csv("options_data.csv") # Out-of-coredf=df.with_columns( price=qp.black_scholes(df, 'S', 'K', 'T', 'r', 'sigma', 'call')['price'] )

Features

Data Summary Tools: Out-of-core data summarization for big data
Option Pricing: Black-Scholes, Cox-Ross-Rubinstein (CRR), Barone-Adesi-Whaley (BAW) models
Implied Volatility: Calculation of implied volatility
Greeks: Delta, Gamma, Theta, Vega, Rho calculators

Key Optimizations

Vectorized DataFrame API: Functions operate on Polars DataFrames for batch processing of multiple options
Fast Norm CDF Approximation: Implemented Abramowitz & Stegun approximation using Polars expressions
Lazy Evaluation: All operations are lazy, enabling out-of-core processing for big data

Updated API

The functions now work on Polars DataFrames, allowing for:

Batch Processing: Price thousands of options in a single operation
Big Data Ready: Handles datasets larger than memory with Polars' streaming
Extreme Speed: Vectorized operations on columnar data

Performance Benefits

No Loops: All vectorized in Polars/Rust
Memory Efficient: Columnar storage and lazy evaluation
Scalable: Handles billions of rows with minimal memory
Parallel: Automatic parallelization where possible

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README_files/libs		README_files/libs
src/quantpolars		src/quantpolars
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.html		README.html
README.md		README.md
REFACTORING_SUMMARY.md		REFACTORING_SUMMARY.md
TTEST_DOCUMENTATION.md		TTEST_DOCUMENTATION.md
demo_ttest.py		demo_ttest.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QuantPolars

Installation

Data Summary Function (`sm`)

Features

Basic Usage

Column Reference

Styled Output

Data Type Handling

Out-of-Core Example

Features

Key Optimizations

Updated API

Performance Benefits

About

Uh oh!

Releases

Packages

Languages

License

matthewgson/quantpolars

Folders and files

Latest commit

History

Repository files navigation

QuantPolars

Installation

Data Summary Function (sm)

Features

Basic Usage

Column Reference

Styled Output

Data Type Handling

Out-of-Core Example

Features

Key Optimizations

Updated API

Performance Benefits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Data Summary Function (`sm`)

Packages