Data Curation for Open Source LLM Fine-Tuning • Data Science Festival

Everyone wants to fine-tune open source LLMs, but a lack of high quality data makes this hard. Even the data that companies do have is difficult to understand, making it challenging to iterate towards a high quality dataset that will provide good results from fine-tuning. Clemens will share his experience curating datasets to fine-tune models such as Mistral 7B and discuss some of the challenges that should be taken into consideration.

Technical level: Technical practitioner

Session Length: 15 minutes

Supported by