TY - JOUR
T1 - Pathogen exposure misclassification can bias association signals in GWAS of infectious diseases when using population-based common control subjects
AU - Duchen, Dylan
AU - Vergara, Candelaria
AU - Thio, Chloe L.
AU - Kundu, Prosenjit
AU - Chatterjee, Nilanjan
AU - Thomas, David L.
AU - Wojcik, Genevieve L.
AU - Duggal, Priya
N1 - Publisher Copyright:
© 2022 American Society of Human Genetics
PY - 2023/2/2
Y1 - 2023/2/2
N2 - Genome-wide association studies (GWASs) have been performed to identify host genetic factors for a range of phenotypes, including for infectious diseases. The use of population-based common control subjects from biobanks and extensive consortia is a valuable resource to increase sample sizes in the identification of associated loci with minimal additional expense. Non-differential misclassification of the outcome has been reported when the control subjects are not well characterized, which often attenuates the true effect size. However, for infectious diseases the comparison of affected subjects to population-based common control subjects regardless of pathogen exposure can also result in selection bias. Through simulated comparisons of pathogen-exposed cases and population-based common control subjects, we demonstrate that not accounting for pathogen exposure can result in biased effect estimates and spurious genome-wide significant signals. Further, the observed association can be distorted depending upon strength of the association between a locus and pathogen exposure and the prevalence of pathogen exposure. We also used a real data example from the hepatitis C virus (HCV) genetic consortium comparing HCV spontaneous clearance to persistent infection with both well-characterized control subjects and population-based common control subjects from the UK Biobank. We find biased effect estimates for known HCV clearance-associated loci and potentially spurious HCV clearance associations. These findings suggest that the choice of control subjects is especially important for infectious diseases or outcomes that are conditional upon environmental exposures.
AB - Genome-wide association studies (GWASs) have been performed to identify host genetic factors for a range of phenotypes, including for infectious diseases. The use of population-based common control subjects from biobanks and extensive consortia is a valuable resource to increase sample sizes in the identification of associated loci with minimal additional expense. Non-differential misclassification of the outcome has been reported when the control subjects are not well characterized, which often attenuates the true effect size. However, for infectious diseases the comparison of affected subjects to population-based common control subjects regardless of pathogen exposure can also result in selection bias. Through simulated comparisons of pathogen-exposed cases and population-based common control subjects, we demonstrate that not accounting for pathogen exposure can result in biased effect estimates and spurious genome-wide significant signals. Further, the observed association can be distorted depending upon strength of the association between a locus and pathogen exposure and the prevalence of pathogen exposure. We also used a real data example from the hepatitis C virus (HCV) genetic consortium comparing HCV spontaneous clearance to persistent infection with both well-characterized control subjects and population-based common control subjects from the UK Biobank. We find biased effect estimates for known HCV clearance-associated loci and potentially spurious HCV clearance associations. These findings suggest that the choice of control subjects is especially important for infectious diseases or outcomes that are conditional upon environmental exposures.
KW - GWAS
KW - common controls
KW - genetic epidemiology
KW - infectious disease
KW - misclassification bias
KW - population-based controls
UR - http://www.scopus.com/inward/record.url?scp=85147457546&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147457546&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2022.12.013
DO - 10.1016/j.ajhg.2022.12.013
M3 - Article
C2 - 36649706
AN - SCOPUS:85147457546
SN - 0002-9297
VL - 110
SP - 336
EP - 348
JO - American journal of human genetics
JF - American journal of human genetics
IS - 2
ER -