In a world with growing numbers of Large Language Models (LLMs), often teams and organisations are faced with the tricky task of which one do we select for the job?
While public leaderboards like Hugging Face’s MTEB and model-specific performance results provided by AI vendors can offer a detailed level of comparison, sometimes this isn’t enough!
In this presentation, Emma will walk through a process of evaluation to choose between Generative AI models used by the FT Accelerate AI team for summarisation tasks. This evaluation includes industry methods, AI vendor methods and a series of custom tests. The framework of tailored tests can easily be replicated to any model and potentially to a broader range of LLM use cases. Thus, this provides a reusable way to stay on top of which model is best for the task.
To understand the level of technical detail in the presentation, there will be some discussion around article summarisation, vectorisation, cosine similarity calculation, minor tweaking of LLM model parameters and distributions of results.
Technical Level: Technical practitioner