TY - GEN
T1 - Distance metrics for instance-based learning
AU - Salzberg, Steven
N1 - Funding Information:
Supported in part by the Air Force Office of Scientific Research under Grant AFOSR-89-0151.
Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 1991.
PY - 1991
Y1 - 1991
N2 - Instance-based learning techniques use a set of stored training instances to classify new examples. The most common such learning technique is the nearestneighbor method, in which new instances are classified according to the closest training instance. A critical element of any such method is the metric used to determine distance between instances. Euclidean distance is by far the most commonly used metric; no one, however, has systematically considered whether a different metric, such as Manhattan distance, might perform equally well on naturally occurring data sets. Some evidence from psychological research indicates that Manhattan distance might be preferable in some circumstances. This paper examines three different distance metrics and presents experimental comparisons using data from three domains: malignant cancer classification, heart disease diagnosis, and diabetes prediction. The results of these studies indicate that the Manhattan distance metric works works quite well, although not better than the Euclidean metric that has become a standard for machine learning experiments. Because the nearest neighbor technique provides a good benchmark for comparisons with other learning algorithms, the results below include a number of such comparisons, which show that nearest neighbor, using any distance metric, compares quite well to other machine learning techniques.
AB - Instance-based learning techniques use a set of stored training instances to classify new examples. The most common such learning technique is the nearestneighbor method, in which new instances are classified according to the closest training instance. A critical element of any such method is the metric used to determine distance between instances. Euclidean distance is by far the most commonly used metric; no one, however, has systematically considered whether a different metric, such as Manhattan distance, might perform equally well on naturally occurring data sets. Some evidence from psychological research indicates that Manhattan distance might be preferable in some circumstances. This paper examines three different distance metrics and presents experimental comparisons using data from three domains: malignant cancer classification, heart disease diagnosis, and diabetes prediction. The results of these studies indicate that the Manhattan distance metric works works quite well, although not better than the Euclidean metric that has become a standard for machine learning experiments. Because the nearest neighbor technique provides a good benchmark for comparisons with other learning algorithms, the results below include a number of such comparisons, which show that nearest neighbor, using any distance metric, compares quite well to other machine learning techniques.
UR - http://www.scopus.com/inward/record.url?scp=84994684889&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84994684889&partnerID=8YFLogxK
U2 - 10.1007/3-540-54563-8_103
DO - 10.1007/3-540-54563-8_103
M3 - Conference contribution
AN - SCOPUS:84994684889
SN - 9783540545637
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 399
EP - 408
BT - Methodologies for Intelligent Systems - 6th International Symposium, ISMIS 1991, Proceedings
A2 - Ras, Zbigniew W.
A2 - Zemankova, Maria
PB - Springer Verlag
T2 - 6th International Symposium on Methodologies for Intelligent Systems, ISMIS 1991
Y2 - 16 October 1991 through 19 October 1991
ER -