TY - JOUR
T1 - De novo genome assembly of Candida glabrata reveals cell wall protein complement and structure of dispersed tandem repeat arrays
AU - Xu, Zhuwei
AU - Green, Brian
AU - Benoit, Nicole
AU - Schatz, Michael
AU - Wheelan, Sarah
AU - Cormack, Brendan
N1 - Funding Information:
This work was supported by grant R01AI046223 to BPC and by a National Science Foundation award (DBI-1627442) to MC. We thank Haiping Hao of the Johns Hopkins Transcriptomics and Deep Sequencing Core for sequencing on the PacBio SMRT sequencing platform. We thank the Experimental and Computational Genomics Core (ECGC) at the Sidney Kimmel Comprehensive Cancer Center for sequencing on the Illumina sequencing platform. We are particularly indebted to Pascal Durrens (CNRS) for help and advice on historical systematic nomenclature for the existing Candida glabrata genome assembly.
Publisher Copyright:
© 2020 John Wiley & Sons Ltd
PY - 2020/6/1
Y1 - 2020/6/1
N2 - Candida glabratais an opportunistic pathogen in humans, responsible for approximately 20% of disseminated candidiasis. Candida glabrata's ability to adhere to host tissue is mediated by GPI-anchored cell wall proteins (GPI-CWPs); the corresponding genes contain long tandem repeat regions. These repeat regions resulted in assembly errors in the reference genome. Here, we performed a de novo assembly of the C. glabrata type strain CBS138 using long single-molecule real-time reads, with short read sequences (Illumina) for refinement, and constructed telomere-to-telomere assemblies of all 13 chromosomes. Our assembly has excellent agreement overall with the current reference genome, but we made substantial corrections within tandem repeat regions. Specifically, we removed 62 genes of which 45 were scrambled due to misassembly in the reference. We annotated 31 novel ORFs of which 24 ORFs are GPI-CWPs. In addition, we corrected the tandem repeat structure of an additional 21 genes. Our corrections to the genome were substantial, with the length of new genes and tandem repeat corrections amounting to approximately 3.8% of the ORFeome length. As most corrections were within the coding regions of GPI-CWP genes, our genome assembly establishes a high-quality reference set of genes and repeat structures for the functional analysis of these cell surface proteins.
AB - Candida glabratais an opportunistic pathogen in humans, responsible for approximately 20% of disseminated candidiasis. Candida glabrata's ability to adhere to host tissue is mediated by GPI-anchored cell wall proteins (GPI-CWPs); the corresponding genes contain long tandem repeat regions. These repeat regions resulted in assembly errors in the reference genome. Here, we performed a de novo assembly of the C. glabrata type strain CBS138 using long single-molecule real-time reads, with short read sequences (Illumina) for refinement, and constructed telomere-to-telomere assemblies of all 13 chromosomes. Our assembly has excellent agreement overall with the current reference genome, but we made substantial corrections within tandem repeat regions. Specifically, we removed 62 genes of which 45 were scrambled due to misassembly in the reference. We annotated 31 novel ORFs of which 24 ORFs are GPI-CWPs. In addition, we corrected the tandem repeat structure of an additional 21 genes. Our corrections to the genome were substantial, with the length of new genes and tandem repeat corrections amounting to approximately 3.8% of the ORFeome length. As most corrections were within the coding regions of GPI-CWP genes, our genome assembly establishes a high-quality reference set of genes and repeat structures for the functional analysis of these cell surface proteins.
UR - http://www.scopus.com/inward/record.url?scp=85081735448&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081735448&partnerID=8YFLogxK
U2 - 10.1111/mmi.14488
DO - 10.1111/mmi.14488
M3 - Article
C2 - 32068314
AN - SCOPUS:85081735448
SN - 0950-382X
VL - 113
SP - 1209
EP - 1224
JO - Molecular Microbiology
JF - Molecular Microbiology
IS - 6
ER -