TY - JOUR
T1 - The human gene damage index as a gene-level approach to prioritizing exome variants
AU - Itan, Yuval
AU - Shang, Lei
AU - Boisson, Bertrand
AU - Patin, Etienne
AU - Bolze, Alexandre
AU - Moncada-Vélez, Marcela
AU - Scott, Eric
AU - Ciancanelli, Michael J.
AU - Lafaille, Fabien G.
AU - Markle, Janet G.
AU - Martinez-Barricarte, Ruben
AU - De Jong, Sarah Jill
AU - Kong, Xiao Fei
AU - Nitschke, Patrick
AU - Belkadi, Aziz
AU - Bustamante, Jacinta
AU - Puel, Anne
AU - Boisson-Dupuis, Stéphanie
AU - Stenson, Peter D.
AU - Gleeson, Joseph G.
AU - Cooper, David N.
AU - Quintana-Murci, Lluis
AU - Claverie, Jean Michel
AU - Zhang, Shen Ying
AU - Abel, Laurent
AU - Casanova, Jean Laurent
PY - 2015/11/3
Y1 - 2015/11/3
N2 - The protein-coding exome of a patient with a monogenic disease contains about 20,000 variants, only one or two of which are disease causing. We found that 58% of rare variants in the protein-coding exome of the general population are located in only 2% of the genes. Prompted by this observation, we aimed to develop a gene-level approach for predicting whether a given human protein-coding gene is likely to harbor disease-causing mutations. To this end, we derived the gene damage index (GDI): A genome-wide, gene-level metric of the mutational damage that has accumulated in the general population. We found that the GDI was correlated with selective evolutionary pressure, protein complexity, coding sequence length, and the number of paralogs. We compared GDI with the leading gene-level approaches, genic intolerance, and de novo excess, and demonstrated that GDI performed best for the detection of false positives (i.e., removing exome variants in genes irrelevant to disease), whereas genic intolerance and de novo excess performed better for the detection of true positives (i.e., assessing de novo mutations in genes likely to be disease causing). The GDI server, data, and software are freely available to noncommercial users from lab.rockefeller.edu/casanova/GDI.
AB - The protein-coding exome of a patient with a monogenic disease contains about 20,000 variants, only one or two of which are disease causing. We found that 58% of rare variants in the protein-coding exome of the general population are located in only 2% of the genes. Prompted by this observation, we aimed to develop a gene-level approach for predicting whether a given human protein-coding gene is likely to harbor disease-causing mutations. To this end, we derived the gene damage index (GDI): A genome-wide, gene-level metric of the mutational damage that has accumulated in the general population. We found that the GDI was correlated with selective evolutionary pressure, protein complexity, coding sequence length, and the number of paralogs. We compared GDI with the leading gene-level approaches, genic intolerance, and de novo excess, and demonstrated that GDI performed best for the detection of false positives (i.e., removing exome variants in genes irrelevant to disease), whereas genic intolerance and de novo excess performed better for the detection of true positives (i.e., assessing de novo mutations in genes likely to be disease causing). The GDI server, data, and software are freely available to noncommercial users from lab.rockefeller.edu/casanova/GDI.
KW - Gene prioritization
KW - Gene-level
KW - Mutational damage
KW - Next generation sequencing
KW - Variant prioritization
UR - http://www.scopus.com/inward/record.url?scp=84946595515&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84946595515&partnerID=8YFLogxK
U2 - 10.1073/pnas.1518646112
DO - 10.1073/pnas.1518646112
M3 - Article
C2 - 26483451
AN - SCOPUS:84946595515
SN - 0027-8424
VL - 112
SP - 13615
EP - 13620
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 44
ER -