TY - JOUR
T1 - Linking Electronic Health Record and Trauma Registry Data
T2 - Assessing the Value of Probabilistic Linkage
AU - Durojaiye, Ashimiyu B.
AU - Puett, Lisa L.
AU - Levin, Scott
AU - Toerper, Matthew
AU - McGeorge, Nicolette M.
AU - Webster, Kristen L.W.
AU - Deol, Gurmehar S.
AU - Kharrazi, Hadi
AU - Lehmann, Harold P.
AU - Gurses, Ayse P.
N1 - Publisher Copyright:
© 2018 Georg Thieme Verlag KG Stuttgart. New York.
PY - 2018
Y1 - 2018
N2 - Background Electronic health record (EHR) systems contain large volumes of novel heterogeneous data that can be linked to trauma registry data to enable innovative research not possible with either data source alone. Objective This article describes an approach for linking electronically extracted EHR data to trauma registry data at the institutional level and assesses the value of probabilistic linkage. Methods Encounter data were independently obtained from the EHR data warehouse (n = 1,632) and the pediatric trauma registry (n = 1,829) at a Level I pediatric trauma center. Deterministic linkage was attempted using nine different combinations of medical record number (MRN), encounter identity (ID) (visit ID), age, gender, and emergency department (ED) arrival date. True matches from the best performing variable combination were used to create a gold standard, which was used to evaluate the performance of each variable combination, and to train a probabilistic algorithm that was separately used to link records unmatched by deterministic linkage and the entire cohort. Additional records that matched probabilistically were investigated via chart review and compared against records that matched deterministically. Results Deterministic linkage with exact matching on any three of MRN, encounter ID, age, gender, and ED arrival date gave the best yield of 1,276 true matches while an additional probabilistic linkage step following deterministic linkage yielded 110 true matches. These records contained a significantly higher number of boys compared to records that matched deterministically and etiology was attributable to mismatch between MRNs in the two data sets. Probabilistic linkage of the entire cohort yielded 1,363 true matches. Conclusion The combination of deterministic and an additional probabilistic method represents a robust approach for linking EHR data to trauma registry data. This approach may be generalizable to studies involving other registries and databases.
AB - Background Electronic health record (EHR) systems contain large volumes of novel heterogeneous data that can be linked to trauma registry data to enable innovative research not possible with either data source alone. Objective This article describes an approach for linking electronically extracted EHR data to trauma registry data at the institutional level and assesses the value of probabilistic linkage. Methods Encounter data were independently obtained from the EHR data warehouse (n = 1,632) and the pediatric trauma registry (n = 1,829) at a Level I pediatric trauma center. Deterministic linkage was attempted using nine different combinations of medical record number (MRN), encounter identity (ID) (visit ID), age, gender, and emergency department (ED) arrival date. True matches from the best performing variable combination were used to create a gold standard, which was used to evaluate the performance of each variable combination, and to train a probabilistic algorithm that was separately used to link records unmatched by deterministic linkage and the entire cohort. Additional records that matched probabilistically were investigated via chart review and compared against records that matched deterministically. Results Deterministic linkage with exact matching on any three of MRN, encounter ID, age, gender, and ED arrival date gave the best yield of 1,276 true matches while an additional probabilistic linkage step following deterministic linkage yielded 110 true matches. These records contained a significantly higher number of boys compared to records that matched deterministically and etiology was attributable to mismatch between MRNs in the two data sets. Probabilistic linkage of the entire cohort yielded 1,363 true matches. Conclusion The combination of deterministic and an additional probabilistic method represents a robust approach for linking EHR data to trauma registry data. This approach may be generalizable to studies involving other registries and databases.
KW - deterministic linkage
KW - electronic health records
KW - probabilistic linkage
KW - record linkage
KW - trauma registry
UR - http://www.scopus.com/inward/record.url?scp=85062972020&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062972020&partnerID=8YFLogxK
U2 - 10.1055/s-0039-1681087
DO - 10.1055/s-0039-1681087
M3 - Article
C2 - 30453337
AN - SCOPUS:85062972020
SN - 0026-1270
VL - 57
SP - 261
EP - 269
JO - Methods of information in medicine
JF - Methods of information in medicine
IS - 5-6
ER -