TY - JOUR
T1 - Enriching Unsupervised User Embedding via Medical Concepts
AU - Huang, Xiaolei
AU - Dernoncourt, Franck
AU - Dredze, Mark
N1 - Funding Information:
The authors thank the anonymous reviews for their insightful comments and suggestions. This work was supported in part by a research gift from Adobe Research. The authors would also thank the HPC cluster provided by the University of Memphis.
Publisher Copyright:
© 2022 X. Huang, F. Dernoncourt & M. Dredze.
PY - 2022
Y1 - 2022
N2 - Clinical notes in Electronic Health Records (EHR) present rich documented information of patients to inference phenotype for disease diagnosis and study patient characteristics for cohort selection. Unsupervised user embedding aims to encode patients into fixed-length vectors without human supervisions. Medical concepts extracted from the clinical notes contain rich connections between patients and their clinical categories. However, existing unsupervised approaches of user embeddings from clinical notes do not explicitly incorporate medical concepts. In this study, we propose a concept-aware unsupervised user embedding that jointly leverages text documents and medical concepts from two clinical corpora, MIMIC-III and Diabetes. We evaluate user embeddings on both extrinsic and intrinsic tasks, including phenotype classification, in-hospital mortality prediction, patient retrieval, and patient relatedness. Experiments on the two clinical corpora show our approach exceeds unsupervised baselines, and incorporating medical concepts can significantly improve the baseline performance.
AB - Clinical notes in Electronic Health Records (EHR) present rich documented information of patients to inference phenotype for disease diagnosis and study patient characteristics for cohort selection. Unsupervised user embedding aims to encode patients into fixed-length vectors without human supervisions. Medical concepts extracted from the clinical notes contain rich connections between patients and their clinical categories. However, existing unsupervised approaches of user embeddings from clinical notes do not explicitly incorporate medical concepts. In this study, we propose a concept-aware unsupervised user embedding that jointly leverages text documents and medical concepts from two clinical corpora, MIMIC-III and Diabetes. We evaluate user embeddings on both extrinsic and intrinsic tasks, including phenotype classification, in-hospital mortality prediction, patient retrieval, and patient relatedness. Experiments on the two clinical corpora show our approach exceeds unsupervised baselines, and incorporating medical concepts can significantly improve the baseline performance.
UR - http://www.scopus.com/inward/record.url?scp=85163855752&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163855752&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85163855752
SN - 2640-3498
VL - 174
SP - 63
EP - 78
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 3rd Conference on Health, Inference, and Learning, CHIL 2022
Y2 - 7 April 2022 through 8 April 2022
ER -