The realities of building domain- specific language models for production by Matt Harding & Stanimir Vichev

LSEG Labs have built a set of domain-specific language models, based on Google’s BERT architecture, using LSEG’s proprietary financial data. In this talk, we will discuss our journey taking these models from inception to production, covering all the pain-points along the way. Focusing on our Financial News NLP model, we will look at the pre-processing of financial news, training on GCP Preemptible TPUs and running inference via AWS Batch Transform. We will discuss how we benchmark our model using a downstream classification task. Finally, we will look at the pros & cons of different ways we are able to serve these models to customers.