TY - JOUR
T1 - A Machine Learning Approach to Predicting Autism Risk Genes
T2 - Validation of Known Genes and Discovery of New Candidates
AU - Lin, Ying
AU - Afshar, Shiva
AU - Rajadhyaksha, Anjali M.
AU - Potash, James B.
AU - Han, Shizhong
N1 - Funding Information:
The authors would like to thank Dr. Mingyao Ying for fruitful discussions. Funding. This study was partially supported by National Institutes of Health grants R01 AA022994 and AA024486 (to SH). This manuscript has been released as a Pre-Print at bioRxiv (Ying et al., 2018).
Publisher Copyright:
© Copyright © 2020 Lin, Afshar, Rajadhyaksha, Potash and Han.
PY - 2020/9/10
Y1 - 2020/9/10
N2 - Autism spectrum disorder (ASD) is a complex neurodevelopmental condition with a strong genetic basis. The role of de novo mutations in ASD has been well established, but the set of genes implicated to date is still far from complete. The current study employs a machine learning-based approach to predict ASD risk genes using features from spatiotemporal gene expression patterns in human brain, gene-level constraint metrics, and other gene variation features. The genes identified through our prediction model were enriched for independent sets of ASD risk genes, and tended to be down-expressed in ASD brains, especially in frontal and parietal cortex. The highest-ranked genes not only included those with strong prior evidence for involvement in ASD (for example, NBEA, HERC1, and TCF20), but also indicated potentially novel candidates, such as, MYCBP2 and CAND1, which are involved in protein ubiquitination. We also showed that our method outperformed state-of-the-art scoring systems for ranking curated ASD candidate genes. Gene ontology enrichment analysis of our predicted risk genes revealed biological processes clearly relevant to ASD, including neuronal signaling, neurogenesis, and chromatin remodeling, but also highlighted other potential mechanisms that might underlie ASD, such as regulation of RNA alternative splicing and ubiquitination pathway related to protein degradation. Our study demonstrates that human brain spatiotemporal gene expression patterns and gene-level constraint metrics can help predict ASD risk genes. Our gene ranking system provides a useful resource for prioritizing ASD candidate genes.
AB - Autism spectrum disorder (ASD) is a complex neurodevelopmental condition with a strong genetic basis. The role of de novo mutations in ASD has been well established, but the set of genes implicated to date is still far from complete. The current study employs a machine learning-based approach to predict ASD risk genes using features from spatiotemporal gene expression patterns in human brain, gene-level constraint metrics, and other gene variation features. The genes identified through our prediction model were enriched for independent sets of ASD risk genes, and tended to be down-expressed in ASD brains, especially in frontal and parietal cortex. The highest-ranked genes not only included those with strong prior evidence for involvement in ASD (for example, NBEA, HERC1, and TCF20), but also indicated potentially novel candidates, such as, MYCBP2 and CAND1, which are involved in protein ubiquitination. We also showed that our method outperformed state-of-the-art scoring systems for ranking curated ASD candidate genes. Gene ontology enrichment analysis of our predicted risk genes revealed biological processes clearly relevant to ASD, including neuronal signaling, neurogenesis, and chromatin remodeling, but also highlighted other potential mechanisms that might underlie ASD, such as regulation of RNA alternative splicing and ubiquitination pathway related to protein degradation. Our study demonstrates that human brain spatiotemporal gene expression patterns and gene-level constraint metrics can help predict ASD risk genes. Our gene ranking system provides a useful resource for prioritizing ASD candidate genes.
KW - autism
KW - constraint
KW - de novo mutation
KW - gene expression
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85091511410&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091511410&partnerID=8YFLogxK
U2 - 10.3389/fgene.2020.500064
DO - 10.3389/fgene.2020.500064
M3 - Article
C2 - 33133139
AN - SCOPUS:85091511410
SN - 1664-8021
VL - 11
JO - Frontiers in Genetics
JF - Frontiers in Genetics
M1 - 500064
ER -