In the rapidly evolving field of Geographic Information Systems (GIS), the integration of Databricks offers groundbreaking opportunities for processing and analysing spatial data at scale. This talk aims to explore the innovative application of Databricks in GIS, highlighting its capabilities to perform geospatial analysis, data integration, and real-time processing in a distributed manner.
We begin by introducing the fundamentals of GIS and the unique challenges presented by geospatial data, such as its complexity and voluminous nature. Traditional GIS tools like ESRI ArcGIS have long been at the forefront of managing and analysing spatial data. However, the advent of big data and the need for more scalable solutions have led to the exploration of new platforms such as Databricks. We shall also introduce the concept of the H3 index, a unique grid-based approach that overcomes many of the issues encountered with traditional geospatial operations. The talk will then delve into how Databricks, a cloud-based big data analytics platform, addresses these challenges using a library known as Mosaic. Key features of Mosaic will be discussed showing how they can be used for advanced analytics, and machine learning capabilities within Databricks.
Furthermore, the talk will cover the technical aspects of integrating GIS with Databricks. This includes discussions on data formats, libraries for GIS, and the use of APIs for connecting GIS software with Databricks environments.
Technical level: Technical practitioner
Session Length: 40 minutes