Sensitive information about individuals can be recovered from different types of data releases. This presentation will explore the privacy risks in publishing data in different formats and introduce privacy techniques to defend against them. From low-dimensional microdata files and raw location traces to aggregate statistics and machine learning models, we will look at real-world examples of unintended information disclosure, highlight different attack models and discuss principles and techniques to protect the privacy of individuals present in the data. Some of the takeaways of the session include:

– Common pitfalls of anonymising datasets

– How linkage attacks can be used to re-identify individuals using quasi-identifiers

– Privacy attacks on machine learning models and how they can be used to recover sensitive information about individuals in the training data

– An introduction to the differential privacy framework and how it can be used to mitigate privacy risks