TY - JOUR
T1 - Genotype imputation performance of three reference panels using African ancestry individuals
AU - Vergara, Candelaria
AU - Parker, Margaret M.
AU - Franco, Liliana
AU - Cho, Michael H.
AU - Valencia-Duarte, Ana V.
AU - Beaty, Terri H.
AU - Duggal, Priya
N1 - Funding Information:
Funding This project was funded in part with federal funds from the office of AIDS Research through the Center for Inherited Diseases at Johns Hopkins University, the National Institutes of Drug Abuse R01013324. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. Liliana Franco was supported by COLCIENCIAS’s (Administrative Department of Science, Technology and Innovation -Departamento Admin-istrativo de Ciencia, Tecnología e Innovación-) scholarship program for PhD students and the Epidemiology Group of National School of Public Health of the University of Antioquia. The COPDGene project (NCT00608764) was supported by Award Number R01HL089897 and Award Number R01HL089856 from the National Heart, Lung, and Blood Institute. Margaret M. Parker was supported by T32HL007427. The COPDGene project is also supported by the COPD Foundation through contributions made to the Industry Advisory Board comprising AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, Sunovion, and GlaxoSmithKline.
Funding Information:
Conflict of interest M.H.C. has received grant support from GSK. The remaining authors declare that they have no conflict of interest.
Funding Information:
Earlier available reference panels include the Human Genome Diversity Project (Cavalli-Sforza 2005), the Hap-Map Consortium (The International HapMap 3 Consortium et al. 2010) and the 1000 Genomes Project (1000G) (Sud-mant et al. 2015). More recently, the Haplotype Reference Consortium (HRC) (McCarthy et al. 2016) was constructed via a predominantly European ancestry consortium currently comprising 32,611 individuals with whole-genome or exome sequences available. The HRC includes the Genome of The Netherlands (GoNL), 250 Dutch parent–offspring families sequenced at 12× depth (Genome of the Netherlands Consortium et al. 2014), the UK10K project with nearly 10,000 individuals whose whole genome was sequenced at 7×, or exome sequenced at 80× (Walter et al. 2015) and 1000G subjects among other cohorts (http://www.haplotype-refer ence-consortium.org/participating-cohorts). Another project, funded by the UK government, plans to sequence 100,000 whole genomes from patients registered and treated by the National Health Service (http://www.genomicsengland.co. uk/the-100000-genomes-project/). These dense reference panels will allow better imputation of low-frequency and rare variants (Deelen et al. 2014) and the discovery of new variants (Walter et al. 2015; Warren et al. 2017), but are generally focused on populations of European descent.
Publisher Copyright:
© 2018, Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2018/4/1
Y1 - 2018/4/1
N2 - Genotype imputation estimates unobserved genotypes from genome-wide makers, to increase genome coverage and power for genome-wide association studies. Imputation has been successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We compared the performance of these reference panels when imputing variation in 3747 African Americans (AA) from two cohorts (HCV and COPDGene) genotyped using Illumina Omni microarrays. The haplotypes of 2504 (1000G), 883 (CAAPA) and 32,470 individuals (HRC) were used as reference. We compared the number of variants, imputation quality, imputation accuracy and coverage between panels. In both cohorts, 1000G imputed 1.5–1.6× more variants than CAAPA and 1.2× more than HRC. Similar findings were observed for variants with imputation R 2 > 0.5 and for rare, low-frequency, and common variants. When merging imputed variants of the three panels, the total number was 62–63 M with 20 M overlapping variants imputed by all three panels, and a range of 5–15 M variants imputed exclusively with one of them. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. 1000G, HRC and CAAPA provided high performance and accuracy for imputation of African American individuals, increasing the number of variants available for subsequent analyses. These panels are complementary and would benefit from the development of an integrated African reference panel.
AB - Genotype imputation estimates unobserved genotypes from genome-wide makers, to increase genome coverage and power for genome-wide association studies. Imputation has been successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We compared the performance of these reference panels when imputing variation in 3747 African Americans (AA) from two cohorts (HCV and COPDGene) genotyped using Illumina Omni microarrays. The haplotypes of 2504 (1000G), 883 (CAAPA) and 32,470 individuals (HRC) were used as reference. We compared the number of variants, imputation quality, imputation accuracy and coverage between panels. In both cohorts, 1000G imputed 1.5–1.6× more variants than CAAPA and 1.2× more than HRC. Similar findings were observed for variants with imputation R 2 > 0.5 and for rare, low-frequency, and common variants. When merging imputed variants of the three panels, the total number was 62–63 M with 20 M overlapping variants imputed by all three panels, and a range of 5–15 M variants imputed exclusively with one of them. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. 1000G, HRC and CAAPA provided high performance and accuracy for imputation of African American individuals, increasing the number of variants available for subsequent analyses. These panels are complementary and would benefit from the development of an integrated African reference panel.
UR - http://www.scopus.com/inward/record.url?scp=85045133145&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045133145&partnerID=8YFLogxK
U2 - 10.1007/s00439-018-1881-4
DO - 10.1007/s00439-018-1881-4
M3 - Article
C2 - 29637265
AN - SCOPUS:85045133145
SN - 0340-6717
VL - 137
SP - 281
EP - 292
JO - Human Genetics
JF - Human Genetics
IS - 4
ER -