TY - JOUR
T1 - Rcorrector
T2 - Efficient and accurate error correction for Illumina RNA-seq reads
AU - Song, Li
AU - Florea, Liliana
N1 - Funding Information:
This work was supported in part by NSF awards ABI-1159078 and ABI-1356078 to LF.
PY - 2015
Y1 - 2015
N2 - Background: Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. Findings: We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Conclusions: Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.
AB - Background: Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. Findings: We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Conclusions: Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.
KW - Error correction
KW - K-mers
KW - Next-generation sequencing
KW - RNA-seq
UR - http://www.scopus.com/inward/record.url?scp=84979519357&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84979519357&partnerID=8YFLogxK
U2 - 10.1186/s13742-015-0089-y
DO - 10.1186/s13742-015-0089-y
M3 - Article
C2 - 26500767
AN - SCOPUS:84979519357
SN - 2047-217X
VL - 4
JO - GigaScience
JF - GigaScience
IS - 1
M1 - 48
ER -