TY - JOUR
T1 - Transferability of neural network clinical deidentification systems
AU - Lee, Kahyun
AU - Dobbins, Nicholas J.
AU - McInnes, Bridget
AU - Yetisgen, Meliha
AU - Uzuner, Özlem
N1 - Publisher Copyright:
© 2021 The Author(s) 2021.
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Objective: Neural network deidentification studies have focused on individual datasets. These studies assume the availability of a sufficient amount of human-annotated data to train models that can generalize to corresponding test data. In real-world situations, however, researchers often have limited or no in-house training data. Existing systems and external data can help jump-start deidentification on in-house data; however, the most efficient way of utilizing existing systems and external data is unclear. This article investigates the transferability of a state-of-the-art neural clinical deidentification system, NeuroNER, across a variety of datasets, when it is modified architecturally for domain generalization and when it is trained strategically for domain transfer. Materials and Methods: We conducted a comparative study of the transferability of NeuroNER using 4 clinical note corpora with multiple note types from 2 institutions. We modified NeuroNER architecturally to integrate 2 types of domain generalization approaches. We evaluated each architecture using 3 training strategies. We measured transferability from external sources; transferability across note types; the contribution of external source data when in-domain training data are available; and transferability across institutions. Results and Conclusions: Transferability from a single external source gave inconsistent results. Using additional external sources consistently yielded an F1-score of approximately 80%. Fine-tuning emerged as a dominant transfer strategy, with or without domain generalization. We also found that external sources were useful even in cases where in-domain training data were available. Transferability across institutions differed by note type and annotation label but resulted in improved performance.
AB - Objective: Neural network deidentification studies have focused on individual datasets. These studies assume the availability of a sufficient amount of human-annotated data to train models that can generalize to corresponding test data. In real-world situations, however, researchers often have limited or no in-house training data. Existing systems and external data can help jump-start deidentification on in-house data; however, the most efficient way of utilizing existing systems and external data is unclear. This article investigates the transferability of a state-of-the-art neural clinical deidentification system, NeuroNER, across a variety of datasets, when it is modified architecturally for domain generalization and when it is trained strategically for domain transfer. Materials and Methods: We conducted a comparative study of the transferability of NeuroNER using 4 clinical note corpora with multiple note types from 2 institutions. We modified NeuroNER architecturally to integrate 2 types of domain generalization approaches. We evaluated each architecture using 3 training strategies. We measured transferability from external sources; transferability across note types; the contribution of external source data when in-domain training data are available; and transferability across institutions. Results and Conclusions: Transferability from a single external source gave inconsistent results. Using additional external sources consistently yielded an F1-score of approximately 80%. Fine-tuning emerged as a dominant transfer strategy, with or without domain generalization. We also found that external sources were useful even in cases where in-domain training data were available. Transferability across institutions differed by note type and annotation label but resulted in improved performance.
KW - Deidentification
KW - Domain generalization
KW - Generalizability
KW - Transferability
UR - http://www.scopus.com/inward/record.url?scp=85121214599&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121214599&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocab207
DO - 10.1093/jamia/ocab207
M3 - Article
C2 - 34586386
AN - SCOPUS:85121214599
SN - 1067-5027
VL - 28
SP - 2661
EP - 2669
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 12
ER -