Distance metrics for instance-based learning

Steven Salzberg

doi:10.1007/3-540-54563-8_103

Distance metrics for instance-based learning

Steven Salzberg

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

12 Scopus citations

Abstract

Instance-based learning techniques use a set of stored training instances to classify new examples. The most common such learning technique is the nearestneighbor method, in which new instances are classified according to the closest training instance. A critical element of any such method is the metric used to determine distance between instances. Euclidean distance is by far the most commonly used metric; no one, however, has systematically considered whether a different metric, such as Manhattan distance, might perform equally well on naturally occurring data sets. Some evidence from psychological research indicates that Manhattan distance might be preferable in some circumstances. This paper examines three different distance metrics and presents experimental comparisons using data from three domains: malignant cancer classification, heart disease diagnosis, and diabetes prediction. The results of these studies indicate that the Manhattan distance metric works works quite well, although not better than the Euclidean metric that has become a standard for machine learning experiments. Because the nearest neighbor technique provides a good benchmark for comparisons with other learning algorithms, the results below include a number of such comparisons, which show that nearest neighbor, using any distance metric, compares quite well to other machine learning techniques.

Original language	English (US)
Title of host publication	Methodologies for Intelligent Systems - 6th International Symposium, ISMIS 1991, Proceedings
Editors	Zbigniew W. Ras, Maria Zemankova
Publisher	Springer Verlag
Pages	399-408
Number of pages	10
ISBN (Print)	9783540545637
DOIs	https://doi.org/10.1007/3-540-54563-8_103
State	Published - 1991
Externally published	Yes
Event	6th International Symposium on Methodologies for Intelligent Systems, ISMIS 1991 - Charlotte , United States Duration: Oct 16 1991 → Oct 19 1991

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	542 LNAI Part F2
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	6th International Symposium on Methodologies for Intelligent Systems, ISMIS 1991
Country/Territory	United States
City	Charlotte
Period	10/16/91 → 10/19/91

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/3-540-54563-8_103

Cite this

Salzberg, S. (1991). Distance metrics for instance-based learning. In Z. W. Ras, & M. Zemankova (Eds.), Methodologies for Intelligent Systems - 6th International Symposium, ISMIS 1991, Proceedings (pp. 399-408). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 542 LNAI Part F2). Springer Verlag. https://doi.org/10.1007/3-540-54563-8_103

Distance metrics for instance-based learning. / Salzberg, Steven.
Methodologies for Intelligent Systems - 6th International Symposium, ISMIS 1991, Proceedings. ed. / Zbigniew W. Ras; Maria Zemankova. Springer Verlag, 1991. p. 399-408 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 542 LNAI Part F2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Salzberg, S 1991, Distance metrics for instance-based learning. in ZW Ras & M Zemankova (eds), Methodologies for Intelligent Systems - 6th International Symposium, ISMIS 1991, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 542 LNAI Part F2, Springer Verlag, pp. 399-408, 6th International Symposium on Methodologies for Intelligent Systems, ISMIS 1991, Charlotte , United States, 10/16/91. https://doi.org/10.1007/3-540-54563-8_103

Salzberg S. Distance metrics for instance-based learning. In Ras ZW, Zemankova M, editors, Methodologies for Intelligent Systems - 6th International Symposium, ISMIS 1991, Proceedings. Springer Verlag. 1991. p. 399-408. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/3-540-54563-8_103

Salzberg, Steven. / Distance metrics for instance-based learning. Methodologies for Intelligent Systems - 6th International Symposium, ISMIS 1991, Proceedings. editor / Zbigniew W. Ras ; Maria Zemankova. Springer Verlag, 1991. pp. 399-408 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{0e05e76b9f3344beb60b8a00d2888be8,

title = "Distance metrics for instance-based learning",

abstract = "Instance-based learning techniques use a set of stored training instances to classify new examples. The most common such learning technique is the nearestneighbor method, in which new instances are classified according to the closest training instance. A critical element of any such method is the metric used to determine distance between instances. Euclidean distance is by far the most commonly used metric; no one, however, has systematically considered whether a different metric, such as Manhattan distance, might perform equally well on naturally occurring data sets. Some evidence from psychological research indicates that Manhattan distance might be preferable in some circumstances. This paper examines three different distance metrics and presents experimental comparisons using data from three domains: malignant cancer classification, heart disease diagnosis, and diabetes prediction. The results of these studies indicate that the Manhattan distance metric works works quite well, although not better than the Euclidean metric that has become a standard for machine learning experiments. Because the nearest neighbor technique provides a good benchmark for comparisons with other learning algorithms, the results below include a number of such comparisons, which show that nearest neighbor, using any distance metric, compares quite well to other machine learning techniques.",

author = "Steven Salzberg",

note = "Funding Information: Supported in part by the Air Force Office of Scientific Research under Grant AFOSR-89-0151. Publisher Copyright: {\textcopyright} Springer-Verlag Berlin Heidelberg 1991.; 6th International Symposium on Methodologies for Intelligent Systems, ISMIS 1991 ; Conference date: 16-10-1991 Through 19-10-1991",

year = "1991",

doi = "10.1007/3-540-54563-8_103",

language = "English (US)",

isbn = "9783540545637",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "399--408",

editor = "Ras, {Zbigniew W.} and Maria Zemankova",

booktitle = "Methodologies for Intelligent Systems - 6th International Symposium, ISMIS 1991, Proceedings",

}

TY - GEN

T1 - Distance metrics for instance-based learning

AU - Salzberg, Steven

N1 - Funding Information: Supported in part by the Air Force Office of Scientific Research under Grant AFOSR-89-0151. Publisher Copyright: © Springer-Verlag Berlin Heidelberg 1991.

PY - 1991

Y1 - 1991

N2 - Instance-based learning techniques use a set of stored training instances to classify new examples. The most common such learning technique is the nearestneighbor method, in which new instances are classified according to the closest training instance. A critical element of any such method is the metric used to determine distance between instances. Euclidean distance is by far the most commonly used metric; no one, however, has systematically considered whether a different metric, such as Manhattan distance, might perform equally well on naturally occurring data sets. Some evidence from psychological research indicates that Manhattan distance might be preferable in some circumstances. This paper examines three different distance metrics and presents experimental comparisons using data from three domains: malignant cancer classification, heart disease diagnosis, and diabetes prediction. The results of these studies indicate that the Manhattan distance metric works works quite well, although not better than the Euclidean metric that has become a standard for machine learning experiments. Because the nearest neighbor technique provides a good benchmark for comparisons with other learning algorithms, the results below include a number of such comparisons, which show that nearest neighbor, using any distance metric, compares quite well to other machine learning techniques.

AB - Instance-based learning techniques use a set of stored training instances to classify new examples. The most common such learning technique is the nearestneighbor method, in which new instances are classified according to the closest training instance. A critical element of any such method is the metric used to determine distance between instances. Euclidean distance is by far the most commonly used metric; no one, however, has systematically considered whether a different metric, such as Manhattan distance, might perform equally well on naturally occurring data sets. Some evidence from psychological research indicates that Manhattan distance might be preferable in some circumstances. This paper examines three different distance metrics and presents experimental comparisons using data from three domains: malignant cancer classification, heart disease diagnosis, and diabetes prediction. The results of these studies indicate that the Manhattan distance metric works works quite well, although not better than the Euclidean metric that has become a standard for machine learning experiments. Because the nearest neighbor technique provides a good benchmark for comparisons with other learning algorithms, the results below include a number of such comparisons, which show that nearest neighbor, using any distance metric, compares quite well to other machine learning techniques.

UR - http://www.scopus.com/inward/record.url?scp=84994684889&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994684889&partnerID=8YFLogxK

U2 - 10.1007/3-540-54563-8_103

DO - 10.1007/3-540-54563-8_103

M3 - Conference contribution

AN - SCOPUS:84994684889

SN - 9783540545637

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 399

EP - 408

BT - Methodologies for Intelligent Systems - 6th International Symposium, ISMIS 1991, Proceedings

A2 - Ras, Zbigniew W.

A2 - Zemankova, Maria

PB - Springer Verlag

T2 - 6th International Symposium on Methodologies for Intelligent Systems, ISMIS 1991

Y2 - 16 October 1991 through 19 October 1991

ER -

Distance metrics for instance-based learning

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this