Skip to content

sdpython/pandas-streaming

Repository files navigation

pandas-streaming: streaming API over pandas

Build Status Windowshttps://dev.azure.com/xavierdupre3/pandas_streaming/_apis/build/status/sdpython.pandas_streamingMIT Licensehttps://codecov.io/gh/sdpython/pandas-streaming/branch/main/graph/badge.svg?token=0caHX1rhr8GitHub IssuesDownloadsForksStarssize

pandas-streaming aims at processing big files with pandas, too big to hold in memory, too small to be parallelized with a significant gain. The module replicates a subset of pandas API and implements other functionalities for machine learning.

frompandas_streaming.dfimportStreamingDataFramesdf=StreamingDataFrame.read_csv("filename", sep="\t", encoding="utf-8") fordfinsdf: # process this chunk of data# df is a dataframeprint(df)

The module can also stream an existing dataframe.

importpandasdf=pandas.DataFrame([dict(cf=0, cint=0, cstr="0"), dict(cf=1, cint=1, cstr="1"), dict(cf=3, cint=3, cstr="3")]) frompandas_streaming.dfimportStreamingDataFramesdf=StreamingDataFrame.read_df(df) fordfinsdf: # process this chunk of data# df is a dataframeprint(df)

It contains other helpers to split datasets into train and test with some weird constraints.

Packages

No packages published

Contributors 2

  •  
  •  

Languages