In November 2022, ChatGPT took the world by storm and Large Language Models (LLMs) have been a hot topic ever since. However, their limitations such as outdated training data, restricted context windows, latency and API rate limits have become clear. Retrieval Augmented Generation (RAG) has grown popular as an approach to circumvent some of these challenges but RAG systems are complex and the user experience is hard to test offline, making prod deployments scary.

In this talk, you’ll learn how to tackle these problems to achieve safe and lightning fast deployment iterations of LLM based applications by deploying to “shadow” and feature flagging beta versions using AWS lambda aliases. We’ll dive into how to leverage the unique data this gives us to evaluate our system in production. Target audience include ML/software engineers as well as data scientists. Participants would benefit from having some coding experience with LLMs, RAG and AWS lambdas or equivalent.

Technical level: Technical practitioner

Session Length: 40 minutes