TY - JOUR
T1 - CLASS
T2 - Constrained transcript assembly of RNA-seq reads
AU - Song, Li
AU - Florea, Liliana
N1 - Funding Information:
We thank Illumina for making publicly avaialable the Human Body Map RNA-seq data. This work was supported in part by NSF award 1159078 to LF.
PY - 2013/4/10
Y1 - 2013/4/10
N2 - Background: RNA-seq has revolutionized our ability to survey the cellular transcriptome in great detail. However, while several approaches have been developed, the problem of assembling the short reads into full-length transcripts remains challenging.Results: We developed a novel algorithm and software tool, CLASS (Constraint-based Local Assembly and Selection of Splice variants), for accurately assembling splice variants using local read coverage patterns of RNA-seq reads, contiguity constraints from read pairs and spliced reads, and optionally information about gene structure extracted from cDNA sequence databases. The algorithmic underpinnings of CLASS are: i) a linear program to infer exons, ii) a compact splice graph representation of a gene and its splice variants, and iii) a transcript selection scheme that takes into account contiguity constraints and, where available, knowledge about gene structure.Conclusion: In comparisons against leading transcript assembly programs, CLASS is more accurate on both simulated and real reads and produces results that are easier to interpret when applied to large scale real data, and therefore is a promising analysis tool for next generation sequencing data.Availability: CLASS is available from http://sourceforge.net/projects/splicebox.
AB - Background: RNA-seq has revolutionized our ability to survey the cellular transcriptome in great detail. However, while several approaches have been developed, the problem of assembling the short reads into full-length transcripts remains challenging.Results: We developed a novel algorithm and software tool, CLASS (Constraint-based Local Assembly and Selection of Splice variants), for accurately assembling splice variants using local read coverage patterns of RNA-seq reads, contiguity constraints from read pairs and spliced reads, and optionally information about gene structure extracted from cDNA sequence databases. The algorithmic underpinnings of CLASS are: i) a linear program to infer exons, ii) a compact splice graph representation of a gene and its splice variants, and iii) a transcript selection scheme that takes into account contiguity constraints and, where available, knowledge about gene structure.Conclusion: In comparisons against leading transcript assembly programs, CLASS is more accurate on both simulated and real reads and produces results that are easier to interpret when applied to large scale real data, and therefore is a promising analysis tool for next generation sequencing data.Availability: CLASS is available from http://sourceforge.net/projects/splicebox.
UR - http://www.scopus.com/inward/record.url?scp=84876122166&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876122166&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-14-S5-S14
DO - 10.1186/1471-2105-14-S5-S14
M3 - Article
C2 - 23734605
AN - SCOPUS:84876122166
SN - 1471-2105
VL - 14
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - SUPPL.5
M1 - S14
ER -