The technical session will provide a broad overview of the decisions that data scientists and developers need to consider when productionising generative models including serving various GenAI models, handling concurrency bottlenecks in AI workflows, implementing real-time model streaming and how to secure, optimize, test, containerise and deploy these services.

The session starts with an introduction to what Generative AI at a high level and various use cases of GenAI models.
We then focus on what someone needs to do to put a model into a product (e.g., they need a web server). Attendees will then learn about various challenges when working with GenAI including ability to handle multiple concurrent requests, batch processing and how to stream model outputs to the browser.

Next, the audience will learn how they can implement an Authentication and Authorization layer to secure the services, protect against various attacks such as model jailbreaking using Guardrails and how to optimise their GenAI models using techniques like quantisation so that their services are more performant. Finally, they’ll be introduced to the Checklist framework for testing probabilistic GenAI models using Minimum Functionality Tests, Invariance and Expectation testing before learning about deployment patterns for GenAI services.

This will be a high-level introductory overview with lots of diagrams to explain the concepts instead of a detailed talk in each concept to keep the session engaging and show the attendees the core challenges and approaches for building generative AI services.

Technical level: Introductory level/students (some technical knowledge needed)

Session Length: 40 minutes