TY - JOUR
T1 - A hepatitis B virus (HBV) sequence variation graph improves alignment and sample-specific consensus sequence construction
AU - Duchen, Dylan
AU - Clipman, Steven J.
AU - Vergara, Candelaria
AU - Thio, Chloe L.
AU - Thomas, David L.
AU - Duggal, Priya
AU - Wojcik, Genevieve L.
N1 - Publisher Copyright:
© 2024 Duchen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2024/4
Y1 - 2024/4
N2 - Nearly 300 million individuals live with chronic hepatitis B virus (HBV) infection (CHB), for which no curative therapy is available. As viral diversity is associated with pathogenesis and immunological control of infection, improved methods to characterize this diversity could aid drug development efforts. Conventionally, viral sequencing data are mapped/aligned to a reference genome, and only the aligned sequences are retained for analysis. Thus, reference selection is critical, yet selecting the most representative reference a priori remains difficult. We investigate an alternative pangenome approach which can combine multiple reference sequences into a graph which can be used during alignment. Using simulated short-read sequencing data generated from publicly available HBV genomes and real sequencing data from an individual living with CHB, we demonstrate alignment to a phylogenetically representative ‘genome graph’ can improve alignment, avoid issues of reference ambiguity, and facilitate the construction of sample-specific consensus sequences more genetically similar to the individual’s infection. Graph-based methods can, therefore, improve efforts to characterize the genetics of viral pathogens, including HBV, and have broader implications in host-pathogen research.
AB - Nearly 300 million individuals live with chronic hepatitis B virus (HBV) infection (CHB), for which no curative therapy is available. As viral diversity is associated with pathogenesis and immunological control of infection, improved methods to characterize this diversity could aid drug development efforts. Conventionally, viral sequencing data are mapped/aligned to a reference genome, and only the aligned sequences are retained for analysis. Thus, reference selection is critical, yet selecting the most representative reference a priori remains difficult. We investigate an alternative pangenome approach which can combine multiple reference sequences into a graph which can be used during alignment. Using simulated short-read sequencing data generated from publicly available HBV genomes and real sequencing data from an individual living with CHB, we demonstrate alignment to a phylogenetically representative ‘genome graph’ can improve alignment, avoid issues of reference ambiguity, and facilitate the construction of sample-specific consensus sequences more genetically similar to the individual’s infection. Graph-based methods can, therefore, improve efforts to characterize the genetics of viral pathogens, including HBV, and have broader implications in host-pathogen research.
UR - http://www.scopus.com/inward/record.url?scp=85191639750&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85191639750&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0301069
DO - 10.1371/journal.pone.0301069
M3 - Article
C2 - 38669259
AN - SCOPUS:85191639750
SN - 1932-6203
VL - 19
JO - PloS one
JF - PloS one
IS - 4 April
M1 - e0301069
ER -