A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes

Wei Qi Wei; Cui Tao; Guoqian Jiang; Christopher G. Chute

A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes

Wei Qi Wei, Cui Tao, Guoqian Jiang, Christopher G. Chute

Research output: Contribution to journal › Article › peer-review

Abstract

UNLABELLED: Current research on high throughput identification of patients with a specific phenotype is in its infancy. There is an urgent need to develop a general automatic approach for patient identification.

OBJECTIVE: We took advantage of Mayo Clinic electronic clinical notes and proposed a novel method of combining NLP, machine learning, and ontology for automatic patient identification. We also investigated the benefits of involving existing SNOMED semantic knowledge in a patient identification task.

METHODS: the SVM algorithm was applied on SNOMED concept units extracted from T2DM case/control clinical notes. Precision, recall, and F-score were calculated to evaluate the performance.

RESULTS: This approach achieved an F-score of above 0.950 for both groups when using all identified concept units as features. Concept units from semantic type-Disease or Syndrome contain the most important information for patient identification. Our results also implied that the coarse level concepts contain enough information to classify T2DM cases/controls.

Original language	English (US)
Pages (from-to)	857-861
Number of pages	5
Journal	AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
Volume	2010
State	Published - 2010
Externally published	Yes

ASJC Scopus subject areas

General Medicine

Cite this

@article{b7cfbef92afe499a9b525f1ccd22b34e,

title = "A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes",

abstract = "UNLABELLED: Current research on high throughput identification of patients with a specific phenotype is in its infancy. There is an urgent need to develop a general automatic approach for patient identification.OBJECTIVE: We took advantage of Mayo Clinic electronic clinical notes and proposed a novel method of combining NLP, machine learning, and ontology for automatic patient identification. We also investigated the benefits of involving existing SNOMED semantic knowledge in a patient identification task.METHODS: the SVM algorithm was applied on SNOMED concept units extracted from T2DM case/control clinical notes. Precision, recall, and F-score were calculated to evaluate the performance.RESULTS: This approach achieved an F-score of above 0.950 for both groups when using all identified concept units as features. Concept units from semantic type-Disease or Syndrome contain the most important information for patient identification. Our results also implied that the coarse level concepts contain enough information to classify T2DM cases/controls.",

author = "Wei, {Wei Qi} and Cui Tao and Guoqian Jiang and Chute, {Christopher G.}",

year = "2010",

language = "English (US)",

volume = "2010",

pages = "857--861",

journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",

issn = "1559-4076",

publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - A high throughput semantic concept frequency based approach for patient identification

T2 - a case study using type 2 diabetes mellitus clinical notes

AU - Wei, Wei Qi

AU - Tao, Cui

AU - Jiang, Guoqian

AU - Chute, Christopher G.

PY - 2010

Y1 - 2010

N2 - UNLABELLED: Current research on high throughput identification of patients with a specific phenotype is in its infancy. There is an urgent need to develop a general automatic approach for patient identification.OBJECTIVE: We took advantage of Mayo Clinic electronic clinical notes and proposed a novel method of combining NLP, machine learning, and ontology for automatic patient identification. We also investigated the benefits of involving existing SNOMED semantic knowledge in a patient identification task.METHODS: the SVM algorithm was applied on SNOMED concept units extracted from T2DM case/control clinical notes. Precision, recall, and F-score were calculated to evaluate the performance.RESULTS: This approach achieved an F-score of above 0.950 for both groups when using all identified concept units as features. Concept units from semantic type-Disease or Syndrome contain the most important information for patient identification. Our results also implied that the coarse level concepts contain enough information to classify T2DM cases/controls.

AB - UNLABELLED: Current research on high throughput identification of patients with a specific phenotype is in its infancy. There is an urgent need to develop a general automatic approach for patient identification.OBJECTIVE: We took advantage of Mayo Clinic electronic clinical notes and proposed a novel method of combining NLP, machine learning, and ontology for automatic patient identification. We also investigated the benefits of involving existing SNOMED semantic knowledge in a patient identification task.METHODS: the SVM algorithm was applied on SNOMED concept units extracted from T2DM case/control clinical notes. Precision, recall, and F-score were calculated to evaluate the performance.RESULTS: This approach achieved an F-score of above 0.950 for both groups when using all identified concept units as features. Concept units from semantic type-Disease or Syndrome contain the most important information for patient identification. Our results also implied that the coarse level concepts contain enough information to classify T2DM cases/controls.

UR - http://www.scopus.com/inward/record.url?scp=84983036317&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84983036317&partnerID=8YFLogxK

M3 - Article

C2 - 21347100

AN - SCOPUS:84983036317

SN - 1559-4076

VL - 2010

SP - 857

EP - 861

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

ER -

A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this