We all carry thousands of genes in our genome and have millions of unique genetic pairs. On the surface, trying to narrow down which genes and/or genetic pairs cause a neurodevelopmental disorder such as autism can seem like a “needle in a haystack” endeavor.
By using big data analytics, a University of Missouri research team has discovered more than 100 new potential genetic candidates as the scientific community continues the search for better ways to identify and treat autism.
Chi-Ren Shyu, the Paul K. and Dianne Shumaker Endowed Professor of Electrical Engineering and Computer Science and MU Informatics Institute (MUII) director, and MUII graduate student Matt Spencer worked with researchers at MU’s Thompson Center for Autism and Neurodevelopmental Disorders, and the Departments of Child Health, Statistics in the School of Medicine on “Heritable genotype contrast mining reveals novel gene associations specific to autism subgroups,” recently published in the Journal of Biomedical Informatics.
The goal of this collaborative work was to identify genes and gene interactions that contribute to the development of specific subtypes of autism. The research uncovered 286 distinct genes associated with one or more subgroups of children with autism. A total of 193 are potentially novel genes not previously identified.
“These are genes that nobody’s identified before,” said Judith Miles, professor emerita of the MU Department of Child Health and longtime autism genetics researcher. “It certainly increases the number of autism genes that need to be looked at.”
Spencer and Shyu developed a new method called Heritable Genotype Contrast Mining (HGCM), which uses a technique called frequent pattern mining — similar to the way stores organize items on shelves according to the frequency with which items are purchased together. This method allowed the researchers to use data mining techniques to test combinations of genetic variants known as Single Nucleotide Polymorphisms (SNPs) while searching for associations. SNP associations are commonly used to examine disorders such as autism.
They crunched data provided by the Simons Foundation Autism Research Initiative from 2,591 simplex families, meaning each family had only one child diagnosed with autism. They examined the data based on 12 subgroups: groups with high and low severity in terms of awareness, cognition, communication, mannerisms and motivation and children with some physical dysmorphology, suggesting an insult to normal embryologic development versus those with no physical dysmorphology. Associations of genetic variants were identified only when they are highly contrasted between subgroups and passed rigorous statistical testing.
“We initially ran some tests to make sure the variants went from many million all the way down to 30,000. And those 30,000 have significance to autism research, a certain signal that says these are something we should study further,” Shyu said. “Even after narrowing it down to these promising variants, there is an enormous number of interactions to consider. Fortunately, HGCM is equipped to handle this complexity when mining for genetic interactions relevant to autism using a Big Data ecosystem that parallelize the data mining process.”
The greatest number of relevant genes were discovered in the dysmorphic subgroup, both in terms of novel genes and total genes. The dysmorphic subgroup in this case was defined as the group in which participants exhibited “explicit physical abnormalities.”
This discovery allows for HGCM to potentially be used to study additional subgroups, and the new relevant genes give geneticists a wide array of new avenues for research, including confirming or denying their role in other autism groups of patients. Future research, Shyu said, should also incorporate environmental factors once that data is more readily available.
“This computational framework can include environmental factors if the data are available. We just have to customize (the algorithm) a little bit.” Shyu said. “It is also very capable of mining much larger data sets, like the 50,000-family “Simons Foundation Powering Autism Research for Knowledge” (SPARK) project, where MU Thompson Autism Center is one of the nation’s leading participants.”