Data Pre-processing and Feature Engineering for Machine Learning • Data Science Festival

Data Pre-processing and Feature Engineering for Machine Learning by Soledad Galli

Data in its raw format is almost never suitable for use to train Machine Learning Models. In fact, Data scientists devote a big part of their time to clean and pre-process data. Feature engineering refers to the various processes and techniques that we can use to pre-process variables for use in machine learning modelling. Feature engineering includes transformations like filling missing values, encoding categorical variables, transforming variables mathematically, and creating new variables from existing ones, just to name a few. There are multiple feature engineering techniques that we can use to extract maximum value from features. When should we use each technique, and why? What are their advantages, assumptions and limitations? Are they suitable for every algorithm? In this video, I will discuss various feature engineering techniques, and compare their implementation in open-source Python libraries.

Supported by