Beyond Benchmarks 2.0: A Practical Framework for Measuring Multimodal and Agentic AI Success • Data Science Festival

While most enterprise AI projects start with excitement, only 20% survive the move from demo to production. This updated session evolves the “Beyond Benchmarks” framework for 2026, moving beyond text-only RAG to address the complexities of multimodal and agentic systems. We will explore how to measure success when AI interacts with images and complex documents, and how to evaluate autonomous agents performing multi-step reasoning.

The Session Agenda:
New Application Space: A look at how AI applications have evolved beyond text and simple RAG to focus on multimodal capabilities (including images and complex visual data) and Agentic AI.
The Three-Tiered Metric Plan: A breakdown of application-specific metrics for these new multimodal use cases, business outcome metrics that satisfy stakeholders, and universal metrics for cost and safety
A Phased Roadmap: A practical implementation guide to move your 2026 AI projects from a “cool demo” to a robust, production-ready enterprise solution.

Technical Level of Session: Introductory level/students (some technical knowledge needed)

Supported by