Finding a good structure for number-crunching code can be a problem, this especially applies to routines preceding the core algorithms: transformations such as data processing and cleanup, as well as feature construction. With such code, the programmer faces the problem, that their code easily turns into a sequence of highly interdependent operations, which are hard to separate. It can be challenging to test, maintain and reuse such “Data Science Spaghetti code”.
Data scientists are faced with these problems on a day-to-day basis when writing machine learning pipelines. This is even more important if the models should be used in a production environment. Scikit-Learn offers a simple yet powerful interface for data science algorithms: the estimator and composite classes (called meta-estimators). By example, I show how clever usage of meta-estimators can encapsulate elaborate machine Looking at examples, I will show how this approach simplifies model development, testing and validation and how to brings together best practices from software engineering as well as data science.learning models into a maintainable tree of objects that is both handy to use and simple to test.