Blog Archives

19 Sep 2023

Acing the technical DS Interview

This session covers a framework for understanding and passing technical Data Science interviews, including what the companies want to see & common mistakes to avoid. Technical Level: Non-technical/soft skills

19 Sep 2023

Combine geospatial data in 3D & beyond with TileDB arrays

Have you ever struggled with large amounts of geospatial data, huge volumes of files and many custom formats? Have you spent hours — even days — converting and wrangling disparate data formats and wondering how to combine data from different sources? Then this talk is for you! The solution? Forget about files or force-fitting geospatial […]

19 Sep 2023

Forecasting time-series with Arrow and WASM

This session is a presentation of novel concept of time-series processing pipeline for MLOps – Apache arrow + s3 in backed to process large number of large files – polars or duckdb for data manipulation – super fast task runner written in Rust – WASM tasks in which we do forecasting Focus on current state […]

18 Sep 2023

Gone phishing: Lessons learned training a phishing email classifier with highly imbalanced biased labels

Much of the literature on machine learning with imbalanced data ignores the elephant in the room – the need for high quality labelled training and test data. But in the real world, labels can be generated by processes that introduce selection bias. How do you train a model when the labels have been generated by […]

18 Sep 2023

Lessons from launching an LLM Chatbot

In this speaker session, Stelios Constantinidis, the Head of Research at MQube and Jake Atkinson, Data Scientist in the Automation team will share insights from their recent launch of CriteriaGPT, a cutting-edge RAG-based chatbot powered by GPT-4 and tailored for mortgage brokers. CriteriaGPT was designed to address complex lending criteria questions related to MPowered Mortgages, […]

18 Sep 2023

Building a minimal data science platform

Data science teams often support diverse projects and requests within their organization. As the team grows and tackles more complex projects, software and infrastructure challenges arise in delivering value efficiently and maintaining it in the long run. We will walk through building a data science platform step-by-step, from the ground up, starting with a single […]

18 Sep 2023

Numbers and Art: Statistical pitfalls and their consequences

Today, we have incredible computing muscle to crunch numbers. But hold on – there’s a catch. It’s not about doing the math; it’s about using the right math. In this talk, I will touch on a few topics in Statistics which seem simple, but that have common pitfalls. These are the topics that data scientists […]

11 Sep 2023

Oktoberfest Warm Up Event – Causal Inference in Python: Theory to Practice

Most data scientists know that ‘association does not imply causation’. However, traditional data science and machine learning methods are about association, not causation. At the same time, causal questions are central to many data science problems across sectors, e.g. questions about measuring effects, drivers, incrementality, or about why a change in a certain KPI took […]

06 Sep 2023

How We Scale Paid Marketing with Automation and Predictive Models

Earlier this year, we were facing a scaling issue with decreased marketing efficiency as we spend more on paid marketing. We’ll share the story on how we turned things around through a better budget planning and optimisation process, powered by predictive models and automation. Technical Level: High level/overview