Efficient labelling of large datasets for NLP tasks • Data Science Festival

Creating a good quality labelled data set is a challenge faced by most data scientists, particularly those working in NLP. While raw text data might not be in short supply, most tasks will require this data to be labelled. This often has to be done, rather laboriously, by hand. This talk follows the process of growing a large fully labelled dataset from only a small number of initial labelled examples, automating the process as much as possible. It will explore a few different methods; some older tried and tested methods such as Support Vector Machines, along with newer cutting edge methods like few and zero shot learning.

Supported by