In this session Richard will introduce Tubular – a python package for feature engineering, originally developed within LV= which has now been open sourced. We’ll work through building a feature engineering pipeline using tubular to see some of the key transformers that it offers and how it fits into a data scientist’s workflow. No previous experience with feature engineering is required for this session.


For the event we will be working from this repo;

We will put all the material in the folder and we will be working on this open dataset;

Python Environment

To set up the python environment on their own machines participants should;

Clone the repository: using git:

Get conda: by downloading and installing either Anaconda: or miniconda : (smaller download)

Create the conda environment using the environment file in the repository:

Instructions for this can be found here but we will also cover this at the start of the session:

Alternatively participants can click on the launch binder shield: on the front page of the repository to launch a binder session with the required packages installed that they can work in.


The demo notebook in the repository has code to download the dataset we will be using, we will also cover this at the start of the session: