TY - JOUR
T1 - Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes
AU - Genome Aggregation Database Production Team
AU - Genome Aggregation Database Consortium
AU - Wang, Qingbo
AU - Pierce-Hoffman, Emma
AU - Cummings, Beryl B.
AU - Alföldi, Jessica
AU - Francioli, Laurent C.
AU - Gauthier, Laura D.
AU - Hill, Andrew J.
AU - O’Donnell-Luria, Anne H.
AU - Armean, Irina M.
AU - Banks, Eric
AU - Bergelson, Louis
AU - Cibulskis, Kristian
AU - Collins, Ryan L.
AU - Connolly, Kristen M.
AU - Covarrubias, Miguel
AU - Daly, Mark J.
AU - Donnelly, Stacey
AU - Farjoun, Yossi
AU - Ferriera, Steven
AU - Gabriel, Stacey
AU - Gentry, Jeff
AU - Gupta, Namrata
AU - Jeandet, Thibault
AU - Kaplan, Diane
AU - Laricchia, Kristen M.
AU - Llanwarne, Christopher
AU - Minikel, Eric V.
AU - Munshi, Ruchi
AU - Neale, Benjamin M.
AU - Novod, Sam
AU - Petrillo, Nikelle
AU - Poterba, Timothy
AU - Roazen, David
AU - Ruano-Rubio, Valentin
AU - Saltzman, Andrea
AU - Samocha, Kaitlin E.
AU - Schleicher, Molly
AU - Seed, Cotton
AU - Solomonson, Matthew
AU - Soto, Jose
AU - Tiao, Grace
AU - Tibbetts, Kathleen
AU - Tolonen, Charlotte
AU - Vittal, Christopher
AU - Wade, Gordon
AU - Wang, Arcturus
AU - Ware, James S.
AU - Watts, Nicholas A.
AU - Weisburd, Ben
AU - Pulver, Ann E.
N1 - Funding Information:
We would like to thank the many individuals whose sequence data are aggregated in gnomAD for their contributions to research, and for making this work possible. The results published here are in part based upon data: (1) generated by The Cancer Genome Atlas managed by the NCI and NHGRI (accession: phs000178.v10.p8). Information about TCGA can be found at http://cancergenome.nih.gov, (2) generated by the Genotype-Tissue Expression Project (GTEx) managed by the NIH Common Fund and NHGRI (accession: phs000424.v7.p2), (3) generated by the Exome Sequencing Project, managed by NHLBI, (4) generated by the Alzheimer’s Disease Sequencing Project (ADSP), managed by the NIA and NHGRI (accession: phs000572.v7.p4). We would like to thank the Hail team for developing tools essential for the large-scale computation in this work. We would like to thank the analysis team of the Broad’s Rare Disease Group for their manual inspection of MNVs in rare disease cohorts. This work was funded by NIDDK U54 DK105566, NIGMS R01 GM104371, and NHGRI UM1 HG008900-01. Q. W. was supported by the Nakajima Foundation Scholarship. K.J.K. was supported by NIGMS F32 GM115208. A.O.D.L. was supported by NICHD K12 HD052896.
Funding Information:
D.G.M. is a founder with equity in Goldfinch Bio, and has received research support from AbbVie, Astellas, Biogen, BioMarin, Eisai, Merck, Pfizer, and Sanofi-Genzyme. K.J. K. owns stock in Personalis. E.V.M. has received research support in the form of charitable contributions from Charles River Laboratories and Ionis Pharmaceuticals, and has consulted for Deerfield Management. M.I.M.: The views expressed in this article are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. He has served on advisory panels for Pfizer, NovoNordisk, Zoe Global; has received honoraria from Merck, Pfizer, NovoNordisk, and Eli Lilly; has stock options in Zoe Global and has received research funding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier, and Takeda. As of June 2019, M.I.M. is an employee of Genentech, and holds stock in Roche. R.K.W. has received unrestricted research grants from Takeda Pharmaceutical Company. M.J.D. is a founder of Maze Therapeutics. B.M.N. is a member of the scientific advisory board at Deep Genomics and consultant for Camp4 Therapeutics, Takeda Pharmaceutical, and Biogen. A.O.D.L. has received honoraria from ARUP and Chan Zuckerberg Initiative.
Publisher Copyright:
© 2020, The Author(s).
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs.
AB - Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs.
UR - http://www.scopus.com/inward/record.url?scp=85085576031&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85085576031&partnerID=8YFLogxK
U2 - 10.1038/s41467-019-12438-5
DO - 10.1038/s41467-019-12438-5
M3 - Article
C2 - 32461613
AN - SCOPUS:85085576031
SN - 2041-1723
VL - 11
JO - Nature communications
JF - Nature communications
IS - 1
M1 - 2539
ER -