Finding genes in DNA using decision trees and dynamic programming.

S. Salzberg, X. Chen, J. Henderson, K. Fasman

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

This study demonstrates the use of decision tree classifiers as the basis for a general gene-finding system. The system uses a dynamic programming algorithm that finds the optimal segmentation of a DNA sequence into coding and non-coding regions (exons and introns). The optimality property is dependent on a separate scoring function that takes a subsequence and assigns to it a score reflecting the probability that the sequence is an exon. In this study, the scoring functions were sets of decision trees and rules that were combined to give the probability estimate. Experimental results on a newly collected database of human DNA sequences are encouraging, and some new observations about the structure of classifiers for the gene-finding problem have emerged from this study. We also provide descriptions of a new probability chain model that produces very accurate filters to find donor and acceptor sites.

ASJC Scopus subject areas

  • Medicine(all)

Fingerprint

Dive into the research topics of 'Finding genes in DNA using decision trees and dynamic programming.'. Together they form a unique fingerprint.

Cite this