Abstract
A major goal in bioinformatics is to identify the genes and their transcript variations, collectively defining the transcriptome of a cell or species. There are two main classes of transcript assembly methods: de novo, which assemble reads based solely on sequence overlap, and genome-based, which first align the reads to a reference genome and then assemble the overlapping alignments. The main classes of artifacts are redundancies resulted from incomplete merging of reads and contigs, fragmented transcripts, chimeric constructs, and collapsing of paralogs. The chapter describes general principles underlying current methods for genome-based transcriptome assembly. Genome-based methods allow for better resolution of repeat and paralogous sequences, as well as overlapping gene models, and offer higher sensitivity, particularly in capturing low-coverage transcripts. Transcript reconstruction methods and their mathematical foundations need to continually adapt to provide more accurate solutions and to adapt to the characteristics and biases of the evolving sequencing technologies.
Original language | English (US) |
---|---|
Title of host publication | Computational Methods for Next Generation Sequencing Data Analysis |
Publisher | wiley |
Pages | 245-268 |
Number of pages | 24 |
ISBN (Electronic) | 9781119272182 |
ISBN (Print) | 9781118169483 |
DOIs | |
State | Published - Sep 6 2016 |
Keywords
- Computational methods
- De novo assembly
- Genome-based transcriptome assembly
- Overlapping gene models
- RNA-seq reads
- Sequencing technologies
- Transcript reconstruction methods
ASJC Scopus subject areas
- General Computer Science