Understanding the value of incremental improvements of data science models can be very challenging without experimentation, which may not always be possible due to legal restrictions or traffic volume limitations for multivariate experiments. Testing numerous variants is often impractical because the minimum detectable effect would be high, while incremental improvements involve relatively smaller gains. Traditionally, data scientists rely on exploratory data analysis such as correlation analysis, summary statistics, and regression analysis to identify the most promising model modification. However, this approach may lead to biased conclusions and sub-optimal solutions, wasting valuable time.
To address this issue, I will discuss causal inference methods that de-bias the data and approximate the value of incremental improvements. Additionally, I will explain how we use and democratize the propensity score matching and regression discontinuity to accelerate insights. Finally, I will highlight challenges related to omitted variable bias and sensitivity analysis.
Technical level: Technical practioner