It’s often hard to get started with a supervised learning problem without much/any data. But the way you collect data can be annoying and intrusive for users. Collecting data carelessly can therefore drag user metrics down, and reduce data quality, because the people providing it simply don’t have an incentive to provide high quality data. At Cleo, we’ve come across this problem a few times, and each time, we’ve dug deep, and thought about how we could collect data in a way that both:

  1. fit with the overall product user experience
  2. delivered user value immediately, entirely independent of the machine learning down the road, such that the data collection alone was net positive for our users and our metrics

After we started using this framework, we realised that active learning could accelerate everything good about this approach. It allowed us fix more mistakes through the data collection itself, and feed more valuable data to our classifier. Now we always place user experience at the heart of data collection, and it’s been paying off ever since.