TY - JOUR
T1 - gsSKAT
T2 - Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels
AU - Larson, Nicholas B.
AU - McDonnell, Shannon
AU - Cannon Albright, Lisa
AU - Teerlink, Craig
AU - Stanford, Janet
AU - Ostrander, Elaine A.
AU - Isaacs, William B.
AU - Xu, Jianfeng
AU - Cooney, Kathleen A.
AU - Lange, Ethan
AU - Schleutker, Johanna
AU - Carpten, John D.
AU - Powell, Isaac
AU - Bailey-Wilson, Joan E.
AU - Cussenot, Olivier
AU - Cancel-Tassin, Geraldine
AU - Giles, Graham G.
AU - MacInnis, Robert J.
AU - Maier, Christiane
AU - Whittemore, Alice S.
AU - Hsieh, Chih Lin
AU - Wiklund, Fredrik
AU - Catolona, William J.
AU - Foulkes, William
AU - Mandal, Diptasri
AU - Eeles, Rosalind
AU - Kote-Jarai, Zsofia
AU - Ackerman, Michael J.
AU - Olson, Timothy M.
AU - Klein, Christopher J.
AU - Thibodeau, Stephen N.
AU - Schaid, Daniel J.
N1 - Funding Information:
This research was supported by the US Public Health Service, National Institutes of Health (NIH), contract grant number GM065450 (D.J.S.) and National Cancer Institute, grant number U01 CA 89600 (S.N.T.), and the Mayo Clinic Center for Individualized Medicine. J.E.B.W. is supported by the Intramural Program of the National Human Genome Research Institute, NIH. The authors declare no conflict of interest.
Publisher Copyright:
© 2017 WILEY PERIODICALS, INC.
PY - 2017/5
Y1 - 2017/5
N2 - Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results.
AB - Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results.
KW - gene set
KW - next-generation sequencing
KW - pathway
KW - rare variation
UR - http://www.scopus.com/inward/record.url?scp=85013287752&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85013287752&partnerID=8YFLogxK
U2 - 10.1002/gepi.22036
DO - 10.1002/gepi.22036
M3 - Article
C2 - 28211093
AN - SCOPUS:85013287752
SN - 0741-0395
VL - 41
SP - 297
EP - 308
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 4
ER -