Category Encoders accepted into scikit-learn-contrib

In the past I've posted a few times about a library I'm working on called category encoders.  The idea of it is to provide a complete toolbox of scikit-learn compatible transformers for the encoding of categorical variables in different ways. If that sounds interesting, you can check out much more in-depth posts here and here.

Scikit-learn is an extremely popular python package that extends Numpy and Scipy to provide rich machine learning functionality.  It's one of the most active python open source projects and generally has a reputation for being extremely high quality.

In the past year or so, some of the core scikit-learn developers started a project called scikit-learn-contrib, which focuses on providing a collection of scikit-learn compatible libraries that are both easy to use and easy to install.  Contrary to scikit-learn itself, algorithms implemented in contrib libraries may be experimental or not as mature.

Currently in scikit-learn-contrib there are projects:


Large-scale linear classification, regression and ranking.

Maintained by Mathieu Blondel and Fabian Pedregosa.


A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines.

Maintained by Jason Rudy and Mehdi.


Python module to perform under sampling and over sampling with various techniques.

Maintained by Guillaume Lemaitre, Fernando Nogueira, Dayvid Oliveira and Christos Aridas.


Factorization machines and polynomial networks for classification and regression in Python.

Maintained by Vlad Niculae.


Confidence intervals for scikit-learn forest algorithms.

Maintained by Ariel Rokem, Kivan Polimis and Bryna Hazelton.


A high performance implementation of HDBSCAN clustering.

Maintained by Leland McInnes, jc-healy, c-north and Steve Astels.

And now:

Category Encoders!

Check it out in it's new home, look at the other great projects, and if you want to help continue to push forward on it, let me know.




Will has a background in Mechanical Engineering from Auburn, but mostly just writes software now. He was the first employee at Predikto, and is currently building out the premiere platform for predictive maintenance in heavy industry there. When not working on that, he is generally working on something related to python, data science or cycling.