In many experiments, the unit of randomisation is not equal to the unit of analysis. A simple example is an A/B test where users are randomly assigned to either treatment or control, but the metric of interest is a session-level click-through rate. Another example is an experiment randomised at the city-level, but the metric of interest is at the user-level. I will discuss several methods for estimating average treatment effects in these experiments and, in particular, how to accurately estimate the variance required for reliable hypothesis testing. The focus will be on a flexible approach that allows for unbiased estimation of the average treatment effect and reliable inference even in the presence of highly skewed cluster sizes

Supported by