TY - JOUR
T1 - The sequence kernel association test for multicategorical outcomes
AU - Jiang, Zhiwen
AU - Zhang, Haoyu
AU - Ahearn, Thomas U.
AU - Garcia-Closas, Montserrat
AU - Chatterjee, Nilanjan
AU - Zhu, Hongtu
AU - Zhan, Xiang
AU - Zhao, Ni
N1 - Publisher Copyright:
© 2023 The Authors. Genetic Epidemiology published by Wiley Periodicals LLC.
PY - 2023/9
Y1 - 2023/9
N2 - Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set-based analysis methods for genome-wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set-based association analysis method, sequence kernel association test (SKAT)-MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT-MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT-MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data ((Figure presented.)) with SKAT-MC, and identified 21 significant genes in the genome. Consequently, SKAT-MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT-MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.
AB - Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set-based analysis methods for genome-wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set-based association analysis method, sequence kernel association test (SKAT)-MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT-MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT-MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER− breast cancer subtypes. We also investigated educational attainment using UK Biobank data ((Figure presented.)) with SKAT-MC, and identified 21 significant genes in the genome. Consequently, SKAT-MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT-MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.
KW - SKAT
KW - multicategorical data
KW - the generalized logit model
KW - the proportional odds model
UR - http://www.scopus.com/inward/record.url?scp=85153366148&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85153366148&partnerID=8YFLogxK
U2 - 10.1002/gepi.22527
DO - 10.1002/gepi.22527
M3 - Article
C2 - 37078108
AN - SCOPUS:85153366148
SN - 0741-0395
VL - 47
SP - 432
EP - 449
JO - Genetic epidemiology
JF - Genetic epidemiology
IS - 6
ER -