TY - JOUR
T1 - Evaluating the evaluation of cancer driver genes
AU - Tokheim, Collin J.
AU - Papadopoulos, Nickolas
AU - Kinzler, Kenneth W.
AU - Vogelstein, Bert
AU - Karchin, Rachel
N1 - Funding Information:
Thanks to Dr. Daniel Naiman for review of the statistical analysis. This research was funded by National Cancer Institute (NCI) Grant F31CA200266 (to C.J.T.); NCI Grants 5U01CA180956-03 and 1U24CA204817-01 (to R.K.); and The Virginia and D. K. Ludwig Fund for Cancer Research, Lustgarten Foundation for Pancreatic Cancer Research, The Sol Goldman Center for Pancreatic Cancer Research, and NCI Grant P50-CA62924 (to B.V.).
PY - 2016/12/13
Y1 - 2016/12/13
N2 - Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machinelearning- based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future.
AB - Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machinelearning- based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future.
KW - Cancer genomics
KW - Cancer mutations
KW - Computational method evaluation
KW - DNA sequencing
KW - Driver genes
UR - http://www.scopus.com/inward/record.url?scp=85005966836&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85005966836&partnerID=8YFLogxK
U2 - 10.1073/pnas.1616440113
DO - 10.1073/pnas.1616440113
M3 - Article
C2 - 27911828
AN - SCOPUS:85005966836
SN - 0027-8424
VL - 113
SP - 14330
EP - 14335
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 50
ER -