I've just cut a fresh release of the scikit-learn-contrib library, category_encoders. This one included a lot of great contributions from the broader community, which has been really great. A few selected features now available:
- Leave-one-out encoding: a new encoder, based on a popular Kaggle post by Owen Zhang, detailed here and here. (proposal)
- Maintenance fixes in upstream libraries (should get fewer pandas warnings, issue)
- Bugfix for calling fit on the same thing many times (issue)
- Consistent category ordering (proposal)
- Consistent output shape for datasets with inconsistent category appearances (issue)
- Missing value and unknown category handling made consistent across all encoders.
Install or upgrade using the command:
pip install -U category_encoders
All in all a fairly large release by our standards, and there are still some issues open to be worked on. So upgrade, try it out, let me know what you think, and if you'd like to get involved, find us on github here.