Multimodal generative AI has recently seen significant advancements, enabling the creation of realistic images, videos, and audio from textual or other inputs. However, due to the complexity of these models, understanding how they function and how to apply them in practical settings can be challenging. During this talk, Ekaterina will shed light on the inner workings of multimodal generative AI models by discussing key concepts and techniques used in their development. She will also explore various applications and use cases of this technology. The talk is intended for anyone interested in the current state of AI and its potential to produce realistic and immersive multimedia experiences.

Technical level: Technical practitioner

Session Length: 40 minutes