TY - JOUR
T1 - Establishment of an eHAP1 human haploid cell line hybrid reference genome assembled from short and long reads
AU - Law, William D.
AU - Warren, René L.
AU - McCallion, Andrew S.
N1 - Funding Information:
This work was supported from the NIH (MH106522 [ASM]) and the NIH (HG007182). This work was also supported through internal funding from the Johns Hopkins University School of Medicine as part of the Core Coins Program. We acknowledge assistance for Nanopore sequencing from the Genetic Resources Core Facility High-Throughput Sequencing Core. The content of this work is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or other funding organizations. We thank David Mohr for providing guidance and computational resources, Jeffrey Burke from Circulomics for assistance in genomic DNA extraction, and Paul W. Hook and Sarah A. McClymont for critical reading of the manuscript. Additional computational resources were provided by the Maryland Advanced Research Computing Center (MARCC).
Funding Information:
This work was supported from the NIH (MH106522 [ASM]) and the NIH (HG007182). This work was also supported through internal funding from the Johns Hopkins University School of Medicine as part of the Core Coins Program. We acknowledge assistance for Nanopore sequencing from the Genetic Resources Core Facility High-Throughput Sequencing Core. The content of this work is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or other funding organizations.
Publisher Copyright:
© 2020 Elsevier Inc.
PY - 2020/5
Y1 - 2020/5
N2 - Haploid cell lines are a valuable research tool with broad applicability for genetic assays. As such the fully haploid human cell line, eHAP1, has been used in a wide array of studies. However, the absence of a corresponding reference genome sequence for this cell line has limited the potential for more widespread applications to experiments dependent on available sequence, like capture-clone methodologies. We generated ~15× coverage Nanopore long reads from ten GridION flowcells and utilized this data to assemble a de novo draft genome using minimap and miniasm and subsequently polished using Racon. This assembly was further polished using previously generated, low-coverage, Illumina short reads with Pilon and ntEdit. This resulted in a hybrid eHAP1 assembly with >90% complete BUSCO scores. We further assessed the eHAP1 long read data for structural variants using Sniffles and identify a variety of rearrangements, including a previously established Philadelphia translocation. Finally, we demonstrate how some of these variants overlap open chromatin regions, potentially impacting regulatory regions. By integrating both long and short reads, we generated a high-quality reference assembly for eHAP1 cells. The union of long and short reads demonstrates the utility in combining sequencing platforms to generate a high-quality reference genome de novo solely from low coverage data. We expect the resulting eHAP1 genome assembly to provide a useful resource to enable novel experimental applications in this important model cell line.
AB - Haploid cell lines are a valuable research tool with broad applicability for genetic assays. As such the fully haploid human cell line, eHAP1, has been used in a wide array of studies. However, the absence of a corresponding reference genome sequence for this cell line has limited the potential for more widespread applications to experiments dependent on available sequence, like capture-clone methodologies. We generated ~15× coverage Nanopore long reads from ten GridION flowcells and utilized this data to assemble a de novo draft genome using minimap and miniasm and subsequently polished using Racon. This assembly was further polished using previously generated, low-coverage, Illumina short reads with Pilon and ntEdit. This resulted in a hybrid eHAP1 assembly with >90% complete BUSCO scores. We further assessed the eHAP1 long read data for structural variants using Sniffles and identify a variety of rearrangements, including a previously established Philadelphia translocation. Finally, we demonstrate how some of these variants overlap open chromatin regions, potentially impacting regulatory regions. By integrating both long and short reads, we generated a high-quality reference assembly for eHAP1 cells. The union of long and short reads demonstrates the utility in combining sequencing platforms to generate a high-quality reference genome de novo solely from low coverage data. We expect the resulting eHAP1 genome assembly to provide a useful resource to enable novel experimental applications in this important model cell line.
UR - http://www.scopus.com/inward/record.url?scp=85078312406&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078312406&partnerID=8YFLogxK
U2 - 10.1016/j.ygeno.2020.01.009
DO - 10.1016/j.ygeno.2020.01.009
M3 - Article
C2 - 31962144
AN - SCOPUS:85078312406
SN - 0888-7543
VL - 112
SP - 2379
EP - 2384
JO - Genomics
JF - Genomics
IS - 3
ER -