Computational Methods for Transcript Assembly from RNA-SEQ Reads

Stefan Canzar, Liliana Florea

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Scopus citations

Abstract

A major goal in bioinformatics is to identify the genes and their transcript variations, collectively defining the transcriptome of a cell or species. There are two main classes of transcript assembly methods: de novo, which assemble reads based solely on sequence overlap, and genome-based, which first align the reads to a reference genome and then assemble the overlapping alignments. The main classes of artifacts are redundancies resulted from incomplete merging of reads and contigs, fragmented transcripts, chimeric constructs, and collapsing of paralogs. The chapter describes general principles underlying current methods for genome-based transcriptome assembly. Genome-based methods allow for better resolution of repeat and paralogous sequences, as well as overlapping gene models, and offer higher sensitivity, particularly in capturing low-coverage transcripts. Transcript reconstruction methods and their mathematical foundations need to continually adapt to provide more accurate solutions and to adapt to the characteristics and biases of the evolving sequencing technologies.

Original languageEnglish (US)
Title of host publicationComputational Methods for Next Generation Sequencing Data Analysis
Publisherwiley
Pages245-268
Number of pages24
ISBN (Electronic)9781119272182
ISBN (Print)9781118169483
DOIs
StatePublished - Sep 6 2016

Keywords

  • Computational methods
  • De novo assembly
  • Genome-based transcriptome assembly
  • Overlapping gene models
  • RNA-seq reads
  • Sequencing technologies
  • Transcript reconstruction methods

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Computational Methods for Transcript Assembly from RNA-SEQ Reads'. Together they form a unique fingerprint.

Cite this