VirGenA: A reference-based assembler for variable viral genomes

Gennady G. Fedonin, Yury S. Fantin, Alexnader V. Favorov, German A. Shipulin, Alexey D. Neverov

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


Characterization of the within-host genetic diversity of viral pathogens is required for selection of effective treatment of some important viral infections, e.g. HIV, HBV and HCV. Despite the technical ability of detection, there are conflicting data regarding the clinical significance of low-frequency variants, partially because of the difficulty of their distinguishing from experimental artifacts. The issue of cross-contamination is relevant for all highly sensitive techniques, including deep sequencing: even trace contamination leads to a significant increase of false positives in identified SNVs. Determination of infections by multiple genotypes of some viruses, the incidence of which can be considerable, especially in risk groups, is also clinically significant in some cases. We developed a new viral reference-guided assembler, VirGenA, that can separate mixtures of strains of different intraspecies genetic groups (genotypes, subtypes, clades, etc.) and assemble a separate consensus sequence for each group in a mixture. It produced long assemblies for mixture components of extremely low frequencies (<1%) allowing detection of cross-contamination of samples by divergent genotypes. We tested VirGenA on both clinical and simulated data. On both types of data, VirGenA shows better or similar results than the existing de novo assemblers. Cross-platformimplementation (including source code) is freely available at

Original languageEnglish (US)
Pages (from-to)15-25
Number of pages11
JournalBriefings in bioinformatics
Issue number1
StatePublished - Jan 18 2019


  • NGS
  • genotyping
  • mixture separation
  • viral genome assembly

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology


Dive into the research topics of 'VirGenA: A reference-based assembler for variable viral genomes'. Together they form a unique fingerprint.

Cite this