TY - JOUR
T1 - Sources of PCR-induced distortions in high-throughput sequencing data sets
AU - Kebschull, Justus M.
AU - Zador, Anthony M.
N1 - Funding Information:
National Institutes of Health [5R01DA036913-03 to A.Z., 5R01NS073129-05 to A.Z., 5R21DA035538-02 to A.Z.]; Paul G. Allen Family Foundation [11233/ALLEN to A.Z.]; Brain Research Foundation [BRF-SIA-2014-03 to A.Z.]; PhD fellowship from the Boehringer Ingelheim Fonds to J.K. Funding for open access charge: National Institutes of Health [5R01NS073129-05 to A.Z.]. Conflict of interest statement. None declared.
Funding Information:
National Institutes of Health [5R01DA036913-03 to A.Z., 5R01NS073129-05 to A.Z., 5R21DA035538-02 to A.Z.]; PaulG. Allen Family Foundation [11233/ALLEN to A.Z.]; Brain Research Foundation [BRF-SIA-2014-03 to A.Z.]; PhD fellowship from the Boehringer Ingelheim Fonds to J.K. Funding for open access charge: National Institutes of Health [5R01NS073129-05 to A.Z.].
Publisher Copyright:
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2015
Y1 - 2015
N2 - PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCRmediated amplification. Here we examine the effects of four important sources of error-bias, stochasticity, template switches and polymerase errors-on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules.
AB - PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCRmediated amplification. Here we examine the effects of four important sources of error-bias, stochasticity, template switches and polymerase errors-on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules.
UR - http://www.scopus.com/inward/record.url?scp=84983748545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84983748545&partnerID=8YFLogxK
U2 - 10.1093/nar/gkv717
DO - 10.1093/nar/gkv717
M3 - Article
C2 - 26187991
AN - SCOPUS:84983748545
SN - 1362-4962
VL - 43
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 21
M1 - e143
ER -