Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data: A comparison of 3 integrated healthcare delivery systems

Elham Hatef; Masoud Rouhizadeh; Claudia Nau; Fagen Xie; Christopher Rouillard; Mahmoud Abu-Nasser; Ariadna Padilla; Lindsay Joe Lyons; Hadi Kharrazi; Jonathan P. Weiner; Douglas Roblin

doi:10.1093/jamiaopen/ooac006

Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data: A comparison of 3 integrated healthcare delivery systems

Elham Hatef, Masoud Rouhizadeh, Claudia Nau, Fagen Xie, Christopher Rouillard, Mahmoud Abu-Nasser, Ariadna Padilla, Lindsay Joe Lyons, Hadi Kharrazi, Jonathan P. Weiner, Douglas Roblin

Research output: Contribution to journal › Article › peer-review

Abstract

Objective: To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems. Materials and methods: We included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity. Results: The NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0). Discussion: The performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs. Conclusion: The NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems.

Original language	English (US)
Article number	ooac006
Journal	JAMIA Open
Volume	5
Issue number	1
DOIs	https://doi.org/10.1093/jamiaopen/ooac006
State	Published - Apr 1 2022

Keywords

electronic health record
homelessness
housing insecurity
natural language processing
social determinants of health

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1093/jamiaopen/ooac006

Fingerprint

Dive into the research topics of 'Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data: A comparison of 3 integrated healthcare delivery systems'. Together they form a unique fingerprint.

Cite this

Hatef, E., Rouhizadeh, M., Nau, C., Xie, F., Rouillard, C., Abu-Nasser, M., Padilla, A., Lyons, L. J., Kharrazi, H., Weiner, J. P., & Roblin, D. (2022). Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data: A comparison of 3 integrated healthcare delivery systems. JAMIA Open, 5(1), Article ooac006. https://doi.org/10.1093/jamiaopen/ooac006

Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data: A comparison of 3 integrated healthcare delivery systems. / Hatef, Elham; Rouhizadeh, Masoud; Nau, Claudia et al.
In: JAMIA Open, Vol. 5, No. 1, ooac006, 01.04.2022.

Research output: Contribution to journal › Article › peer-review

Hatef, E, Rouhizadeh, M, Nau, C, Xie, F, Rouillard, C, Abu-Nasser, M, Padilla, A, Lyons, LJ, Kharrazi, H , Weiner, JP & Roblin, D 2022, 'Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data: A comparison of 3 integrated healthcare delivery systems', JAMIA Open, vol. 5, no. 1, ooac006. https://doi.org/10.1093/jamiaopen/ooac006

@article{b6ad98f0eca94e00892eb2ddd2467478,

title = "Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data: A comparison of 3 integrated healthcare delivery systems",

abstract = "Objective: To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems. Materials and methods: We included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity. Results: The NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0). Discussion: The performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs. Conclusion: The NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems.",

keywords = "electronic health record, homelessness, housing insecurity, natural language processing, social determinants of health",

author = "Elham Hatef and Masoud Rouhizadeh and Claudia Nau and Fagen Xie and Christopher Rouillard and Mahmoud Abu-Nasser and Ariadna Padilla and Lyons, {Lindsay Joe} and Hadi Kharrazi and Weiner, {Jonathan P.} and Douglas Roblin",

note = "Funding Information: This work was supported by the Johns Hopkins Institute for Clinical and Translational Research (ICTR) which is funded in part by Grant Number UL1 TR003098 from the National Center for Advancing Translational Sciences (NCATS) a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of the Johns Hopkins ICTR, NCATS, or NIH. Publisher Copyright: {\textcopyright} 2022 The Author(s).",

year = "2022",

month = apr,

day = "1",

doi = "10.1093/jamiaopen/ooac006",

language = "English (US)",

volume = "5",

journal = "JAMIA Open",

issn = "2574-2531",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data

T2 - A comparison of 3 integrated healthcare delivery systems

AU - Hatef, Elham

AU - Rouhizadeh, Masoud

AU - Nau, Claudia

AU - Xie, Fagen

AU - Rouillard, Christopher

AU - Abu-Nasser, Mahmoud

AU - Padilla, Ariadna

AU - Lyons, Lindsay Joe

AU - Kharrazi, Hadi

AU - Weiner, Jonathan P.

AU - Roblin, Douglas

N1 - Funding Information: This work was supported by the Johns Hopkins Institute for Clinical and Translational Research (ICTR) which is funded in part by Grant Number UL1 TR003098 from the National Center for Advancing Translational Sciences (NCATS) a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of the Johns Hopkins ICTR, NCATS, or NIH. Publisher Copyright: © 2022 The Author(s).

PY - 2022/4/1

Y1 - 2022/4/1

N2 - Objective: To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems. Materials and methods: We included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity. Results: The NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0). Discussion: The performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs. Conclusion: The NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems.

AB - Objective: To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems. Materials and methods: We included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity. Results: The NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0). Discussion: The performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs. Conclusion: The NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems.

KW - electronic health record

KW - homelessness

KW - housing insecurity

KW - natural language processing

KW - social determinants of health

UR - http://www.scopus.com/inward/record.url?scp=85131360738&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85131360738&partnerID=8YFLogxK

U2 - 10.1093/jamiaopen/ooac006

DO - 10.1093/jamiaopen/ooac006

M3 - Article

C2 - 35224458

AN - SCOPUS:85131360738

SN - 2574-2531

VL - 5

JO - JAMIA Open

JF - JAMIA Open

IS - 1

M1 - ooac006

ER -

Development and assessment of a natural language processing model to identify residential instability in electronic health records' unstructured data: A comparison of 3 integrated healthcare delivery systems

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this