Most approved research grants have a very limited focus, with funding going toward achieving a very specific goal and that goal only. The National Institutes of Health’s National Institute of General Medical Sciences (NIGMS) wanted to turn that idea on its head by finding worthwhile bodies of research to fund — in other words, if its related to a researcher’s general research topic, this money can support it.
Mizzou Engineering faculty member Dong Xu’s research portfolio caught their eye.
Xu, interim co-chair of the Electrical Engineering and Computer Science Department and director of the IT Program, recently received a Maximizing Investigators’ Research Award (MIRA) from the NIGMS. The sum total of funding for his work will be just shy of $2 million over five years in support of his proposal, “Interpretable and extendable deep-learning model for biological sequence analysis and prediction.”
“Biological sequences, including DNA, RNA and protein sequences, represent the largest sources of growing Big Data in current biology and medicine, which provide tremendous opportunities for precision medicine, synthetic biology and other areas,” Xu said. “Deep learning as an emerging machine-learning method has a great potential in utilizing this data in biomedical research. This project will develop and apply cutting-edge deep-learning methods to deliver various sequence-based computational tools for gaining new knowledge, accelerating drug development, and improving personalized diagnosis and treatment.”
The goal of a MIRA is to give researchers greater flexibility to explore different and interesting avenues that come up over the course of the study without being beholden to a strict target. Xu has done tremendous research throughout his career in the area of bioinformatics and computational biology. He has made a career of developing novel algorithms, new software, cutting-edge information systems and creative uses for existing informatics resources, leading to breakthroughs in protein structure prediction, machine learning, high-throughput biological data analyses and much more.
With this new grant, Xu hopes to go even further. His project abstract stated three critical goals for the next five years. Quoted from the abstract, they are:
- Develop a series of novel deep-learning methods and models to specifically target biological sequence analyses and predictions in: (a) general unsupervised representations of DNA/RNA, protein and SNP/mutation sequences that capture both local and global features for various applications; (b) methods to make deep-learning models interpretable for understanding biological mechanisms and generating hypotheses; (c) “rule learning”, which abstracts the underlying “rules” by combining unsupervised learning of large unlabeled data and supervised learning of small labeled data so that it can classify new unlabeled data
- Apply the proposed deep-learning model to DNA/RNA sequence annotation, genotype-phenotype analyses, cancer mutation analyses, protein function/structure prediction, protein localization prediction, and protein post-translational modification prediction.
- Make the data, models, and tools freely accessible to the research community by developing a web resource for biological sequence representations, analyses, and predictions, as well as tutorials to help biologists with no computational knowledge to apply deep learning to their specific research problems.
“We would like to develop better methods for analyses and predictions, which means our methods would be more and more accurate and more usable by others,” Xu explained. “We’re going to make all of these tools available to others in several forms: Web servers, downloadable tools, open-source code — we’ll make them all available to the community for free use. We’d also like to really explore the deep learning models at the theoretical level and how they can help study biology.”