TY - GEN
T1 - Finding Genes in DNA using Decision Trees and Dynamic Programming
AU - Salzberg, Steven
AU - Chen, Xin
AU - Henderson, John
AU - Fasman, Kenneth
N1 - Funding Information:
Thanks to Simon Kasif for many helpful suggestions. This material is based upon work supported by the National Science foundation under Grant Nos. IRI-9116843 and IRI-9223591, and by a "t%ung Faculty Research Initiative grant from the G.W.C. Whiting School of Engineering at, Johns Hopkins University.
Publisher Copyright:
Copyright © 1996, AAAI (www.aaai.org). All rights reserved.
PY - 1996
Y1 - 1996
N2 - This study demonstrates the use of decision tree classifiers as the basis for a general gene-finding system. The system uses a dynamic programming algorithm that finds the optimal segmentation of a DNA sequence into coding and non-coding regions (exons and introns). The optimality property is dependent on a separate scoring function that takes a subsequence and assigns to it a score reflecting the probability that the sequence is an exon. In this study, the scoring functions were sets of decision trees and rules that were combined to give the probability estimate. Experimental results on a newly collected database of human DNA sequences are encouraging, and some new observations about the structure of classifiers for the gene-finding problem have emerged from this study. We also provide descriptions of a new probability chain model that produces very accurate filters to find donor and acceptor sites.
AB - This study demonstrates the use of decision tree classifiers as the basis for a general gene-finding system. The system uses a dynamic programming algorithm that finds the optimal segmentation of a DNA sequence into coding and non-coding regions (exons and introns). The optimality property is dependent on a separate scoring function that takes a subsequence and assigns to it a score reflecting the probability that the sequence is an exon. In this study, the scoring functions were sets of decision trees and rules that were combined to give the probability estimate. Experimental results on a newly collected database of human DNA sequences are encouraging, and some new observations about the structure of classifiers for the gene-finding problem have emerged from this study. We also provide descriptions of a new probability chain model that produces very accurate filters to find donor and acceptor sites.
UR - http://www.scopus.com/inward/record.url?scp=0030339733&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0030339733&partnerID=8YFLogxK
M3 - Conference contribution
C2 - 8877520
AN - SCOPUS:0030339733
T3 - Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology, ISMB 1996
SP - 201
EP - 210
BT - Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology, ISMB 1996
PB - AAAI Press
T2 - 4th International Conference on Intelligent Systems for Molecular Biology, ISMB 1996
Y2 - 12 June 1996 through 15 June 1996
ER -