The talk will cover automated data quality checks performed by large organisations to execute data reliability checks on big datasets in real-time using data profiling and machine learning techniques. The demo will use the open-source library Deequ, Spark framework, and reporting & notifications tools to enforce data issues in a proactive manner. I will be covering an example of a framework I have developed at Amazon and Visa to validate customer-facing data and its integration with notification tools based on statistical methods.
Technical level: Technical Practitioner
Session Length: 40 minutes