Elegant, production-ready extensions for Scikit-learn pipelines
Save time, build faster, scale better
scikitelearn-collections is a curated collection of robust utilities, transformers, wrappers, and experiment tools built on top of the Scikit-learn ecosystem. It helps you streamline model development, experiment tracking, and pipeline customization — all with full Scikit-learn compatibility.
- Plug-and-play
PipelineandColumnTransformercomponents - Drop-in feature generators (dates, text, outliers, etc.)
- Advanced custom transformers and meta-estimators
- Support for nested cross-validation and custom scorers
- Compatible with
GridSearchCVandRandomizedSearchCV - Simple model evaluation wrappers with logging
- Utility functions for feature selection, data cleaning, and split strategies
- Modular design for experimentation & reproducibility
- Clean, tested, and production-grade Python code
- 100% compatible with Scikit-learn’s API & best practices
- Python 3.8+
- scikit-learn >= 1.0
- numpy, pandas, joblib
pip install scikitelearn-collectionsUntil then, you can clone manually:
git clone https://github.com/your-username/scikitelearn-collections.git cd scikitelearn-collections pip install -e .fromsklearn.pipelineimportPipelinefromscikitelearn_collections.transformersimportDateFeatureGenerator, OutlierRemoverfromsklearn.linear_modelimportLogisticRegressionpipeline=Pipeline([ ("date_features", DateFeatureGenerator(columns=["signup_date"])), ("remove_outliers", OutlierRemover(method="zscore", threshold=3.0)), ("classifier", LogisticRegression()) ]) pipeline.fit(X_train, y_train)| Module | Description |
|---|---|
transformers/ | Custom transformers (dates, outliers, encodings, etc.) |
pipelines/ | Reusable ML pipelines with preprocessing and modeling |
wrappers/ | Model wrappers for enhanced evaluation, prediction, and logging |
validators/ | Custom cross-validation strategies and metric calculators |
utils/ | Helper utilities for splits, selection, diagnostics |
examples/ | Real-world usage examples in Jupyter notebooks |
scikitelearn-collections/ │ ├── transformers/ # Custom transformers ├── pipelines/ # Ready-to-use ML pipelines ├── wrappers/ # Model and metric wrappers ├── utils/ # Helper functions and classes ├── validators/ # Scoring & validation strategies ├── examples/ # Example notebooks and scripts ├── tests/ # Unit tests └── README.md # You're here! Explore the examples/ directory for practical Jupyter notebooks:
- Binary classification with preprocessing
- Regression with feature engineering
- Outlier detection & removal
- Cross-validation with custom scoring
- Hyperparameter tuning with pipeline integration
We contributions! To contribute:
- Fork this repository
- Create a new branch:
git checkout -b feature/your-feature - Write clean, tested code
- Ensure all tests pass with
pytest - Submit a pull request
All modules include unit tests in the tests/ directory. Run:
pytestWe use Black for code formatting and expect all code to follow PEP8 guidelines.
This project is licensed under the MIT License.
- Built with using Scikit-learn
- Inspired by real-world ML use-cases in research & production
- Thanks to open-source contributors and community ideas
Have questions or suggestions? Open an issue or start a discussion!
Let your pipelines be elegant, reusable, and powerful. —
scikitelearn-collections