Mastering Synthetic Data Generation in Sports Analytics • Data Science Festival

Being unable to find sufficient data is nothing new for data scientist but it is a particular pain point in the world of Sports Analytics.

This is because in my industry, data collection involved physically attaching equipment to athletes and collecting data as they perform, be it in professional football, rugby or any other sport we serve.

To bridge this gap, synthetic data augmentation is often the solution.

This session explores practical solutions to this common problem through the lens of synthetic data generation techniques. Using real-world examples from my time at StatSports, I will explore various approaches for creating and validating synthetic sports data, from basic augmentation to advanced generative models. Attendees will learn how to identify scenarios where synthetic data can be beneficial, understand different generation techniques suitable for sports-specific data, and gain practical insights into validating synthetic datasets while maintaining physical realism of the athlete.

The session will include concrete examples of implementing these techniques in production environments, common pitfalls to avoid, and best practices for ensuring the reliability of synthetic data in sports analytics applications.

Technical level: Introductory level/students (some technical knowledge needed)

Session Length: 40 minutes

Supported by