TY - JOUR
T1 - Sim4cc
T2 - A cross-species spliced alignment program
AU - Zhou, Leming
AU - Pertea, Mihaela
AU - Delcher, Arthur L.
AU - Florea, Liliana
N1 - Funding Information:
Seed optimizations were performed on the ‘Herd’ Scientific Computing Cluster at the George Washington University (NSF grant CLS20163A). Sloan Research Fellowship (to L.F.); National Institutes of Health grant R01-LM006845 (to Steven L. Salzberg). Funding for open access charge: R01-LM006845 (to Steven L. Salzberg).
PY - 2009
Y1 - 2009
N2 - Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing sequences between different species, owing to their low sensitivity and splice junction accuracy. A new spliced alignment tool, sim4cc, overcomes problems in the earlier tools by incorporating three new features: universal spaced seeds, to increase sensitivity and allow comparisons between species at various evolutionary distances, and powerful splice signal models and evolutionarily-aware alignment techniques, to improve the accuracy of gene models. When tested on vertebrate comparisons at diverse evolutionary distances, sim4cc had significantly higher sensitivity compared to existing alignment programs, more than 10% higher than the closest competitor for some comparisons, while being comparable in speed to its predecessor, sim4. Sim4cc can be used in one-to-one or one-to-many comparisons of genomic and cDNA sequences, and can also be effectively incorporated into a high-throughput annotation engine, as demonstrated by the mapping of 64 000 Fagus grandifolia 454 ESTs and unigenes to the poplar genome.
AB - Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing sequences between different species, owing to their low sensitivity and splice junction accuracy. A new spliced alignment tool, sim4cc, overcomes problems in the earlier tools by incorporating three new features: universal spaced seeds, to increase sensitivity and allow comparisons between species at various evolutionary distances, and powerful splice signal models and evolutionarily-aware alignment techniques, to improve the accuracy of gene models. When tested on vertebrate comparisons at diverse evolutionary distances, sim4cc had significantly higher sensitivity compared to existing alignment programs, more than 10% higher than the closest competitor for some comparisons, while being comparable in speed to its predecessor, sim4. Sim4cc can be used in one-to-one or one-to-many comparisons of genomic and cDNA sequences, and can also be effectively incorporated into a high-throughput annotation engine, as demonstrated by the mapping of 64 000 Fagus grandifolia 454 ESTs and unigenes to the poplar genome.
UR - http://www.scopus.com/inward/record.url?scp=67649892606&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67649892606&partnerID=8YFLogxK
U2 - 10.1093/nar/gkp319
DO - 10.1093/nar/gkp319
M3 - Article
C2 - 19429899
AN - SCOPUS:67649892606
SN - 0305-1048
VL - 37
JO - Nucleic acids research
JF - Nucleic acids research
IS - 11
M1 - e80
ER -