Bacteria are present everywhere. They can make us ill. They can keep us healthy. They can clean our wastewater and be part of the production of enzymes for detergents. In other words, bacteria have an endless range of applications. But if we are to utilize this potential to the full, we need to know a much larger part of the genome of bacteria. And therein lies the catch: Today, we know the genome of far less than one percent of bacteria in nature. The rest are often described as the “microbial dark matter”.
In the project “Data Science meets Microbial Dark Matter”, researchers from the Department of Chemistry and Bioscience and the Department of Computer Science at Aalborg University will pool their knowledge on DNA sequencing, graph analysis and machine learning for mapping more bacterial genomes faster. The project is one of three, for which researchers from Aalborg University have received grants through the VILLUM FOUNDATION’s new Synergy Programme.
Professor with Specific Responsibilities Mads Albertsen from the Department of Chemistry and Bioscience is one of the project leads, and he has – together with colleagues - pioneered methods used for “taking the fingerprints of bacteria”. As he explains, however:
- We have all the machines necessary for studying DNA, and we are able to generate large amounts of data. But when we have data at such a massive scale as is currently the case, we need the competences of the computer scientists in order to utilize it and for helping us map even more bacterial genomes even faster.
DNA analyses getting increasingly evidence-based
In practice, the researchers will investigate how machine learning can be applied to improve each step in the DNA sequencing process and employ graph-based methods for managing, analyzing and exploiting the information that is generated in each step.
Mads Albertsen elaborates:
- When we study DNA, we study small segments at a time. In order to compile a bacterial genome, we put these together into larger sections. But when you have a sample from nature, you have thousands of bacterial genomes all at once. After putting the DNA segments together, we need to take them apart again. But the tools that exist today separate these processes from one another. And that means a loss of information and evidence. So one of the major challenges is to be able to determine that specific DNA segments derive from the same genome.
Therefore, the researchers will attempt to link together the processes that happen over the scope of the entire sequencing – from raw data to finished bacterial genomes – with the exact purpose of being able to better follow the “decision” of why a specific genome comes from one specific bacteria.
The larger goal: A complete Tree of Life
From the Department of Computer Science, Professor Katja Hose and Professor with Specific Responsibilities Thomas Dyhre Nielsen, researchers within graph analyses and machine learning, respectively, participate in the project. Katja Hose adds:
- Data science and bioscience are two very different disciplines that evolve at a different pace and mostly independently from each other. This makes it almost impossible to keep up with the most recent advances within the two fields. In this project, we hope to be able to achieve synergy between the fields, but also to raise the bar on how we can utilize each other’s knowledge.
Furthermore, it is a well-known challenge that today, it takes years from researchers generating new genetic data in their lab to the day that other researchers can make use of it. Therefore the researchers also hope that combining computer science and micro biology will create a shortcut, so that the mapping of the world’s bacterial diversity may happen even faster:
- In addition to it becoming obvious that we can learn a lot from each other and from each other’s approach to conducting research, this is a unique opportunity for us as computer scientists to explore new applications of technical concepts from our field that were not originally designed to serve the exact purpose they will be applied in, says Katja Hose.
The project is a so-called pilot project, but the Aalborg researchers hope and believe that this may kick off a much more extensive collaboration.
- We will start with testing a handful of samples, but our goal is of course that with time, we can scale this to thousands of samples, so that we can actually generate a much higher number of genomes and contribute to building databases that provide us with a complete tree of life, says Mads Albertsen.
About the projeCt
Title: Data Science Meets Microbial Dark Matter
Professor with Specific Responsibilities Thomas Dyhre Nielsen
Department of Computer Science
Phone: 9940 9854
Media: Nina Hermansen, Mail: firstname.lastname@example.org, phone: 2090 1829