Numbers and Art: Statistical pitfalls and their consequences • Data Science Festival

Today, we have incredible computing muscle to crunch numbers. But hold on – there’s a catch. It’s not about doing the math; it’s about using the right math.

In this talk, I will touch on a few topics in Statistics which seem simple, but that have common pitfalls. These are the topics that data scientists and statisticians discuss and disagree on most often, which can have dangerous consequences in the world. These are the topics that make people say statistics is an “art”.

To make the point we will start with a concrete example, the Simpson’s Paradox, which is a statistical phenomenon that leads to counterintuitive and misleading conclusions given a misspecified model. This is a great example of what happens when we oversimplify the world, and the risks of doing so.

To relate to the attendees, we will then discuss a popular-amongst-the-audience statistical technique where we often see oversimplification: Hypothesis Testing. Here, we go under the hood of the topic where we discuss why p-values have become a central point of discussion amongst statisticians, and how bayesian statistics offer an alternative (and ‘more complicated’) framework for statistical inference.

Finally, we give the audience the tools to generalize this discussion by talking about performance metrics of models, and expand their vocabulary beyond “accuracy” to “precision/recall” and beyond. The audience will realize that they unconsciously use these concepts when they assess day-to-day outcomes in their life, and that they are not only used in machine learning to measure model performance, but also have important political ramifications: would you rather jail an innocent person, or let a guilty person free?

Join me in this talk which helps demystify the art of statistics.

Technical Level: Technical practitioner

Supported by