TY - JOUR
T1 - Imputation-aware tag SNP selection to improve power for large-scale, multi-ethnic association studies
AU - Wojcik, Genevieve L.
AU - Fuchsberger, Christian
AU - Taliun, Daniel
AU - Welch, Ryan
AU - Martin, Alicia R.
AU - Shringarpure, Suyash
AU - Carlson, Christopher S.
AU - Abecasis, Goncalo
AU - Kang, Hyun Min
AU - Boehnke, Michael
AU - Bustamante, Carlos D.
AU - Gignoux, Christopher R.
AU - Kenny, Eimear E.
N1 - Funding Information:
Research reported in this paper was supported by the Office of Research Infrastructure under award number S10OD018522 and the National Human Genome Research Institute under award numbers U01HG007376, U01HG007417, U01HG007419, U01HG009080 and R01HG000376 of the National Institutes of Health. CRG was supported partially by T32HG00044. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Publisher Copyright:
Copyright © 2018 Reid et al.
PY - 2018/10/1
Y1 - 2018/10/1
N2 - The emergence of very large cohorts in genomic research has facilitated a focus on genotypeimputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r2 at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency , 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
AB - The emergence of very large cohorts in genomic research has facilitated a focus on genotypeimputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r2 at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency , 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
KW - Genetics
KW - Genomics
KW - Imputation
KW - Statistical
KW - Tag snps array design
UR - http://www.scopus.com/inward/record.url?scp=85054448991&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054448991&partnerID=8YFLogxK
U2 - 10.1534/g3.118.200502
DO - 10.1534/g3.118.200502
M3 - Article
C2 - 30131328
AN - SCOPUS:85054448991
SN - 2160-1836
VL - 8
SP - 3255
EP - 3267
JO - G3: Genes, Genomes, Genetics
JF - G3: Genes, Genomes, Genetics
IS - 10
ER -