Sarahi Garcia in 2011 collecting the first of the samples. - she collected the last samples in 2013.

What did you contribute with in this study?

I contributed with 6 metagenomes of the 10,534 datasets (~0.0570%) and 104 metagenome-assembled genomes of the 67,464 MAGs (~0.1542%). You see my contribution is really small among the almost 200 co-authors. I think that is the magic of science “we stand on the shoulders of giants”, and as scientists work together in networks to uncover the secrets of nature.

If you consider sustainability, what potential lies in this public repository of 52,515 microbial draft genomes, what can it be used for?

A public repository with genomic data is like a gold mine that can be used to extract a lot of valuable information and tools for biotechnological applications. Data-driven science can take advantage of such databases.

Do you have concrete examples of valuable information that can be extracted from a database like this?

For example, a friend and colleague of mine, Maliheh Mehrshad, and her colleagues used a huge metagenomic database from the Caspian Sea to mine for L-asparaginases. L-asparaginase has been used for the treatment of acute lymphoblastic leukemia for more than 30 years. However, since then, there have been continued efforts to look for similar enzymes with more desirable properties due to the immunogenicity, short half-life, rapid clearance, and L-glutaminase side activity of the existing commercial enzymes. Caspian water has a similar salt composition to the human serum, so it was perfect to look at their microorganisms through metagenomics for different L-asparaginases. They screened almost three million predicted enzymes of the assembled Caspian Sea metagenomes and the screening resulted in 87 putative L-asparaginase genes from the Caspian Sea datasets. Of those, they tested and found two of the recombinant enzymes represented remarkable anti-proliferative activity against leukemia cell line Jurkat while no cytotoxic effect on human erythrocytes or human umbilical vein endothelial cells was detected.

 The metagenomes of which microbes did you contribute with?

I collected the samples back during my Ph.D. and first Postdoc. The samples are microbial model communities from Lake Grosse Fuchskuhle in Germany and Trout Bog Lake in the USA. I have independently published the results for those samples in 2018 during my second postdoc: Model Communities Hint at Promiscuous Metabolic Linkages between Ubiquitous Free-Living Freshwater Bacteria. These metagenomes thought me a lot about microbial cooperation in aquatic ecosystems, and last year I was approached by JGI because they wanted to include the metagenomes in their larger study. 

How do you plan to use this freshly released database?

 We (Alejandro R. Gijon, Julia Nuy, 2 collaborators from JGI, and 2 collaborators in SLU) are planning to do a survey of genome size on Earth that can serve as a base to compare future research results.

Over the last century, microbiologists have isolated microorganisms from their natural communities and focused on their behavior in laboratory environments. This has given us a biased picture of microorganisms because most of the environmental microorganisms do not grow well in the laboratories. Despite this, microbiological studies have produced a staggering depth and breadth of knowledge in cellular microbiology. With current technologies such as metagenomics, we can now learn about microbial ecology and evolution in microorganisms that are most abundant in nature. This information can help us complete the picture of the roles of microorganisms on Earth and their life strategies. Genome sizes tell us the story about the strategy’s microorganisms uses to be successful. That knowledge can help humans improve biotechnological tools that we have developed based on microorganisms.

 Publication: Nayfach S et al. A Genomic Catalog of Earth’s Microbiomes. Nature  Biotechnology. 2020 Nov 9.