Communities of microorganisms participate in and drive a variety of biochemical processes that have significant impact on the environment around them. This impact ranges widely from causing diseases to offering new kinds of antibiotics, to helping crops grow by fixing nitrogen in the soil. Metagenomics is the study of such microbial communities through their extracted DNA. One of the first questions to ask when studying such a community is: "which organisms are present and at what abundance?" Most current approaches to this so-called community profiling problem take a parsimonious approach: infer the presence of the fewest organisms possible that still agrees with the observed data. However, treating each organism as completely different from any other can lead to mis-estimates of the different kinds and amounts of organisms present, referred to as biological diversity. This is further complicated by the fact that historically there has been much disagreement about the proper way to analytically measure biological diversity. In this project, the investigators leverage a recently defined, unifying notion of biological diversity to address the problem of correctly profiling a microbial community which has in it organisms of varying similarity. To accomplish this, a new mathematical framework is put forward that utilizes the big data approach of compressive sensing. After advancing the mathematical theory, the investigators will create a software implementation that will allow biomedical researchers to study metagenomic communities while properly accounting for varying organism similarity. While advancing discovery, this project promotes graduate and undergraduate student teaching and learning. In particular, students are guided to excel at interdisciplinary work in the fields of mathematics and biology. Furthermore, beyond the traditional dissemination routes of conferences and papers, a wide audience is also engaged through a collaboration with SciShow, a popular YouTube channel that will work with the PIs in creating episodes featuring metagenomics suitable for the general public.<br/><br/>Microorganismal community profiling, determining the identity and relative abundance of all microbial organisms present in a given environmental sample through their sequenced DNA, is an important first step in the study of such communities. Many tools and approaches have been proposed to profile microbial communities, and while these tools take advantage of particular properties of microbial genetics to perform the classification task, there is a general lack of rigorous mathematical approaches that allow for definitive statements to be made about such classifications. Furthermore, the estimated biological diversity of a community can vary widely depending on the computational approach used. This is problematic given that biological diversity is a key metric when studying the impact of a bacterial community on its surrounding environment and aberrations of this quantity have been implicated in a number of diseases. This difficulty is further compounded by the fact that there is much disagreement in the scientific community on how to measure biodiversity. Recently, however, it was shown that a single formula subsumes and unifies many of the most popular biodiversity measures. In this project, the PIs utilize this definition of biodiversity to develop a rigorous mathematical approach to simultaneously profile a microbial community and characterize its biological diversity. Mathematically, this reduces to developing (and proving results about) an optimization procedure where the objective function includes biological diversity. Intriguingly, such an approach parallels that of compressive sensing and other such "big data" sparsity promoting optimization routines. The main approach will be to reduce this measure of biodiversity to a quasinorm and appropriately modify existing proofs about the convergence and guaranteed reconstruction of such optimization routines. This will result in an optimization framework that allows for simultaneously profiling a microbial community and characterizing its biodiversity directly while considering organism similarity. This will be implemented in user-friendly software and used to analyze gut samples from healthy and sick pre-term infants.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
QuBBD: Fast, Efficient Mathematical Approach to the Analysis of the Human Microbiome through Biodiversity Optimization
Ivan Ivanov; Williams, David; Simon Foucart
Oregon State University