Skip to content

Serving any JSON/CSN/Parquet/Arrow files like Postgres tables with Datafusion

License

Notifications You must be signed in to change notification settings

monoscope-tech/datafusion-postgres

datafusion-postgres

Crates.io Version

A PostgreSQL-compatible server for Apache DataFusion, supporting authentication, role-based access control, and SSL/TLS encryption. Available as both a library and CLI tool.

Built on pgwire to provide PostgreSQL wire protocol compatibility for analytical workloads. It was originally an example of the pgwire project.

Scope of the Project

  • datafusion-postgres: Postgres frontend for datafusion, as a library.
    • Serving Datafusion SessionContext with pgwire library
    • Customizible authentication
    • Permission control
    • Built-in pg_catalog tables
    • Built-in postgres functions for common meta queries
  • datafusion-postgres-cli: A cli tool starts a postgres compatible server for datafusion supported file formats, just like python's SimpleHTTPServer.
  • arrow-pg: A data type mapping, encoding/decoding library for arrow and postgres(pgwire) data types.

See auth.rs for complete implementation examples using DfAuthSource.

Quick Start

The Library datafusion-postgres

The high-level entrypoint of datafusion-postgres library is the serve function which takes a datafusion SessionContext and some server configuration options.

use std::sync::Arc;use datafusion::prelude::SessionContext;use datafusion_postgres::{serve,ServerOptions};// Create datafusion SessionContextlet session_context = Arc::new(SessionContext::new());// Configure your `session_context`// ...// Start the Postgres compatible server with SSL/TLSlet server_options = ServerOptions::new().with_host("127.0.0.1".to_string()).with_port(5432).with_tls_cert_path(Some("server.crt".to_string())).with_tls_key_path(Some("server.key".to_string()));serve(session_context,&server_options).await

Security Features

The server automatically includes:

  • User authentication (default postgres superuser)
  • Role-based access control with predefined roles:
    • readonly: SELECT permissions
    • readwrite: SELECT, INSERT, UPDATE, DELETE permissions
    • dbadmin: Full administrative permissions
  • SSL/TLS encryption when certificates are provided
  • Query-level permission checking

The CLI datafusion-postgres-cli

Command-line tool to serve JSON/CSV/Arrow/Parquet/Avro files as PostgreSQL-compatible tables.

datafusion-postgres-cli 0.6.1 A PostgreSQL interface for DataFusion. Serve CSV/JSON/Arrow/Parquet files as tables. USAGE: datafusion-postgres-cli [OPTIONS] FLAGS: -h, --help Prints help information -V, --version Prints version information OPTIONS: --arrow <arrow-tables>... Arrow files to register as table, using syntax `table_name:file_path` --avro <avro-tables>... Avro files to register as table, using syntax `table_name:file_path` --csv <csv-tables>... CSV files to register as table, using syntax `table_name:file_path` -d, --dir <directory> Directory to serve, all supported files will be registered as tables --host <host> Host address the server listens to [default: 127.0.0.1] --json <json-tables>... JSON files to register as table, using syntax `table_name:file_path` --parquet <parquet-tables>... Parquet files to register as table, using syntax `table_name:file_path` -p <port> Port the server listens to [default: 5432] --tls-cert <tls-cert> Path to TLS certificate file for SSL/TLS encryption --tls-key <tls-key> Path to TLS private key file for SSL/TLS encryption 

🔒 Security Options

# Run with SSL/TLS encryption datafusion-postgres-cli \ --csv data:sample.csv \ --tls-cert server.crt \ --tls-key server.key # Run without encryption (development only) datafusion-postgres-cli --csv data:sample.csv

📋 Example Usage

Basic Example

Host a CSV dataset as a PostgreSQL-compatible table:

datafusion-postgres-cli --csv climate:delhiclimate.csv
Loaded delhiclimate.csv as table climate TLS not configured. Running without encryption. Listening on 127.0.0.1:5432 (unencrypted) 

Connect with psql

🔐 Authentication: The default setup allows connections without authentication for development. For secure deployments, use DfAuthSource with standard pgwire authentication handlers (cleartext, MD5, or SCRAM). See auth.rs for implementation examples.

psql -h 127.0.0.1 -p 5432 -U postgres
postgres=>SELECTCOUNT(*) FROM climate; count -------1462 (1 row) postgres=>SELECTdate, meantemp FROM climate WHERE meantemp >35LIMIT5; date | meantemp ------------+----------2017-05-15 | 36.92017-05-16 | 37.92017-05-17 | 38.62017-05-18 | 37.42017-05-19 | 35.4 (5 rows) postgres=>BEGIN; BEGIN postgres=>SELECTAVG(meantemp) FROM climate; avg ------------------25.4955206557617 (1 row) postgres=>COMMIT; COMMIT

🔐 Production Setup with SSL/TLS

# Generate SSL certificates openssl req -x509 -newkey rsa:4096 -keyout server.key -out server.crt \ -days 365 -nodes -subj "/C=US/ST=CA/L=SF/O=MyOrg/CN=localhost"# Start secure server datafusion-postgres-cli \ --csv climate:delhiclimate.csv \ --tls-cert server.crt \ --tls-key server.key
Loaded delhiclimate.csv as table climate TLS enabled using cert: server.crt and key: server.key Listening on 127.0.0.1:5432 with TLS encryption 

Community

Developer Mailing List

If you like the idea of pgwire, datafusion-postgres and want to join the development of the library, or its ecosystem integrations, extensions, you are welcomed to join our developer mailing list: https://groups.io/g/pgwire-dev/

License

This library is released under Apache license.

About

Serving any JSON/CSN/Parquet/Arrow files like Postgres tables with Datafusion

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust83.8%
  • Python13.6%
  • Shell2.0%
  • Nix0.6%