Blog Archives

21 Sep 2022

Word embeddings: Beyond word2vec

Word embeddings: Beyond word2vec. Word embeddings is a very convenient and efficient way to extract semantic information from large collections of textual or textual-like data. We discuss a comparison of the performance of embeddings techniques like word2vec and GloVe as well as fastText and StarSpace in NLP related problems such as metaphor and sarcasm detection.

21 Sep 2022

Learning about the future of money from analyzing blockchain data

A whirlwind tour of the exciting possibilities offered by analyzing blockchain data – including appraising the economic potential opened up by smart contract operations, assessing the risk involved with storing and trading crypto assets on exchanges and uncovering the huge “whales” that move the entire crypto-economy.

21 Sep 2022

Ethics and Impact, the Humanity in Data science

How do we ensure that we don’t forget about humanity when it comes to data science? How do we make sure that data science has a positive social and emotional impact? Well, we need to understand that data is not objective if there is no human in the loop. Data is a mere tool to […]

21 Sep 2022

The risk of unintended information disclosure in data publishing

Sensitive information about individuals can be recovered from different types of data releases. This presentation will explore the privacy risks in publishing data in different formats and introduce privacy techniques to defend against them. From low-dimensional microdata files and raw location traces to aggregate statistics and machine learning models, we will look at real-world examples […]

21 Sep 2022

Machine and Deep Learning with In-Memory Computing

Apache Ignite is an open source memory-centric distributed database, caching, and processing platform used for transactional, analytical and streaming workloads — delivering in-memory speeds at a petabyte scale. Using demos, this presentation will provide an overview of the Machine Learning and Deep Learning capabilities of Apache Ignite.

21 Sep 2022

Creating Data Pipelines: Build Framework not Pipelines

Data pipelines are necessary for the flow of information from its source to its consumers, typically data scientists, analysts and software developers. Managing data flow from many sources is a complex task where the maintenance cost limits scale of being able to build a large reliable data warehouse. This presentation proposes a number of applied […]

20 Sep 2022

Real-time risk analysis at scale: insuring the world’s largest drone fleets with Flock.

Flock has built the world’s first geospatial risk analysis tool for the drone industry, using real-time data (such as weather conditions and proximity to high risk areas) to quantify and insure drone flights on an hourly basis. In this talk, Flock’s CEO and Data Scientist will reveal how this technique has been extended for the […]

20 Sep 2022

How data can save the planet

We all see the messages that the natural world and biodiversity are under threat, some native species are declining and climate change is affecting the wildlife we see and when and where we see it. But how do we know? This talk will cover how we get the robust evidence needed to be able to […]

20 Sep 2022

It’s Bigger on the Inside – the story of BBC+

Gabriel will be talking about how to use a focus product in order to create capability in an organisation. The BBC has been a technology company since 1922. This means that there are a lot of different data sources. The BBC also has a long editorial tradition and as a media company is liable for […]