TY - JOUR
T1 - The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual
AU - Chao, Kuan Hao
AU - Zimin, Aleksey V.
AU - Pertea, Mihaela
AU - Salzberg, Steven L.
N1 - Funding Information:
This research was supported in part by the U.S. National Institutes of Health [grant numbers R01-HG006677 and R35-GM130151] and by the U.S. National Science Foundation [grant numbers IOS-1744309 and DBI-1759518].
Publisher Copyright:
© The Author(s) 2023. Published by Oxford University Press on behalf of the Genetics Society of America.
PY - 2023/3
Y1 - 2023/3
N2 - We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3, 099, 707, 698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60, 708 putative genes, of which 20, 003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.
AB - We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3, 099, 707, 698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60, 708 putative genes, of which 20, 003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.
KW - DNA sequencing
KW - annotation
KW - genome assembly
KW - reference genome
KW - variant calling
UR - http://www.scopus.com/inward/record.url?scp=85150001161&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85150001161&partnerID=8YFLogxK
U2 - 10.1093/g3journal/jkac321
DO - 10.1093/g3journal/jkac321
M3 - Article
C2 - 36630290
AN - SCOPUS:85150001161
SN - 2160-1836
VL - 13
JO - G3: Genes, Genomes, Genetics
JF - G3: Genes, Genomes, Genetics
IS - 3
M1 - jkac321
ER -