TY - JOUR
T1 - Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype
AU - Kim, Daehwan
AU - Paggi, Joseph M.
AU - Park, Chanhee
AU - Bennett, Christopher
AU - Salzberg, Steven L.
N1 - Funding Information:
We would like to express our thanks to K. Barnes and M. Daya for sharing Omixon’s HLA results with us. We would like to thank B. Langmead and J. Pritt for their invaluable contributions to our discussions on HISAT2. We also greatly appreciate the generosity of G. Danuser and D. Reed in providing wet-lab bench space and equipment for us. This work was supported in part by the National Human Genome Research Institute under grants R01-HG006102 and R01-HG006677 to S.L.S. and by the Cancer Prevention Research Institute of Texas under grant RR170068 to D.K. All authors read and approved the final manuscript.
Publisher Copyright:
© 2019, The Author(s), under exclusive licence to Springer Nature America, Inc.
PY - 2019/8/1
Y1 - 2019/8/1
N2 - The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays.
AB - The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays.
UR - http://www.scopus.com/inward/record.url?scp=85071193100&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071193100&partnerID=8YFLogxK
U2 - 10.1038/s41587-019-0201-4
DO - 10.1038/s41587-019-0201-4
M3 - Article
C2 - 31375807
AN - SCOPUS:85071193100
SN - 1087-0156
VL - 37
SP - 907
EP - 915
JO - Nature biotechnology
JF - Nature biotechnology
IS - 8
ER -