The oceans play a critical role in regulating the climate, e.g., by absorbing excess heat and CO2. Over the last century, numerous measurements have been carried out, providing researchers within oceanography with an increasingly large amount of oceanic data about physical, chemical, geological, and biological properties. This data can be used both to gain a deeper understanding of how the oceans influence the climate and to develop solutions and strategies to fight climate change.
Now, researchers from Aalborg University and the University of Haifa present a new set of tools, speeding up research in this domain. For the first time, it is possible to do automated data integration for oceanographic data, which means that what previously took months to do manually, now only takes a couple of hours.
Assistant Professor Tomer Sagi from the Department of Computer Science, Aalborg University, explains:
- We have created a system that allows you to search for oceanic data and integrate these data. The result will be one big file, thereby eliminating the need for integrating different datasets manually, which is both time-consuming and labour-intensive.
Lacking a common dictionary
One of the vast problems with data integration is the need for proper ontologies. An ontology is a formal specification of the concepts and relationships between these concepts within a particular domain, e.g., oceanography.
- When you collect and integrate data from different places, systems and people, you need to agree on the standardised description. It is like a dictionary that everyone must use. Until now, there has not been an ontology covering the whole domain of oceanography, and if the ontology is incomplete, you don't have the necessary vocabulary. That is why we are trying to build one as part of the data integration system, Tomer Sagi explains.
The intelligent agent becomes a domain expert
The first step to build an ontology has been for the researchers to evaluate the ontologies already available. To do so, they developed an ontology evaluation system. They let an intelligent agent - a large language model – read 10.000 oceanographic papers, thereby creating an automated domain expert.
Based on this training, the agent was able to evaluate the current ontologies in the oceanographic domain and highlight which concepts needed to be included or corrected. Surprisingly, the researchers found that the current and most used ontologies only cover a total of less than ten percent of the concepts related to the oceanographic domain.
- These ontologies have mainly been made by humans, and humans make mistakes. Also, humans are only experts in some areas within a domain. So, building ontologies is complex. You need experts, and they get tired, make mistakes, and are expensive. That is where artificial intelligence can really help us, says Tomer Sagi.
He believes that oceanographers will benefit significantly from having a tool with which they can connect data and perform analysis more efficiently. Moreover, he hopes the ontology evaluation system can be of great use outside the domain of oceanography as well:
- Ontologies are used everywhere – they form the basis of most web services. For instance, the more sophisticated chatbots are all based on knowledge graphs, which are based on ontologies. They need to be somewhat maintained and verified. For this purpose, our new system is excellent for fixing and improving current ontologies.
About the project:
The research has been carried out as part of the project ODINI – The Ocean Data Integration Initiative.
ODINI is a collaborative initiative between Aalborg University and the University of Haifa. The project is funded in part by The University of Haifa Data Science Research Center
Publications: Artificial intelligence for ocean science data integration : current state, gaps, and way forward / Sagi, Tomer; Lehahn, Yoav; Bar, Koby. I: Elementa: Science of the Anthropocene, Bind 8, Nr. 1, 418, 15.05.2020.
Assistant professor Tomer Sagi
Data, Knowledge and Web Engineering,
Department of Computer Science, Aalborg University
Phone: +45 9164 4374
The concepts explained
Data integration is the process of combining data in different formats and from different sources into a single, unified view. Data integration allows users to bring data together and make sense of it in a way that is useful for analysis, reporting, or other purposes.
ONTOLOGIES AND INCOMPLETE ONTOLOGIES:
An ontology is a formal specification of the concepts and relationships between these concepts within a particular domain, e.g., oceanography.
Incomplete ontologies can make it challenging to integrate data from different sources.
An example: Some oceanographers divide the Atlantic Ocean into Northern, Southern, and Middle Atlantic subregions. If the ontology only contains the "Atlantic Ocean" concept but not the "Northern Atlantic Ocean," collecting data about the latter may be challenging. Missing a clear distinction between different regions of the ocean render accurately categorizing or analyzing data related to the Northern Atlantic Ocean impossible.