TY - JOUR
T1 - Assessing racial bias in healthcare predictive models
T2 - Practical lessons from an empirical evaluation of 30-day hospital readmission models
AU - Wang, H. Echo
AU - Weiner, Jonathan P.
AU - Saria, Suchi
AU - Lehmann, Harold
AU - Kharrazi, Hadi
N1 - Publisher Copyright:
© 2024 Elsevier Inc.
PY - 2024/8
Y1 - 2024/8
N2 - Objective: Despite increased availability of methodologies to identify algorithmic bias, the operationalization of bias evaluation for healthcare predictive models is still limited. Therefore, this study proposes a process for bias evaluation through an empirical assessment of common hospital readmission models. The process includes selecting bias measures, interpretation, determining disparity impact and potential mitigations. Methods: This retrospective analysis evaluated racial bias of four common models predicting 30-day unplanned readmission (i.e., LACE Index, HOSPITAL Score, and the CMS readmission measure applied as is and retrained). The models were assessed using 2.4 million adult inpatient discharges in Maryland from 2016 to 2019. Fairness metrics that are model-agnostic, easy to compute, and interpretable were implemented and apprised to select the most appropriate bias measures. The impact of changing model's risk thresholds on these measures was further assessed to guide the selection of optimal thresholds to control and mitigate bias. Results: Four bias measures were selected for the predictive task: zero-one-loss difference, false negative rate (FNR) parity, false positive rate (FPR) parity, and generalized entropy index. Based on these measures, the HOSPITAL score and the retrained CMS measure demonstrated the lowest racial bias. White patients showed a higher FNR while Black patients resulted in a higher FPR and zero-one-loss. As the models’ risk threshold changed, trade-offs between models’ fairness and overall performance were observed, and the assessment showed all models’ default thresholds were reasonable for balancing accuracy and bias. Conclusions: This study proposes an Applied Framework to Assess Fairness of Predictive Models (AFAFPM) and demonstrates the process using 30-day hospital readmission model as the example. It suggests the feasibility of applying algorithmic bias assessment to determine optimized risk thresholds so that predictive models can be used more equitably and accurately. It is evident that a combination of qualitative and quantitative methods and a multidisciplinary team are necessary to identify, understand and respond to algorithm bias in real-world healthcare settings. Users should also apply multiple bias measures to ensure a more comprehensive, tailored, and balanced view. The results of bias measures, however, must be interpreted with caution and consider the larger operational, clinical, and policy context.
AB - Objective: Despite increased availability of methodologies to identify algorithmic bias, the operationalization of bias evaluation for healthcare predictive models is still limited. Therefore, this study proposes a process for bias evaluation through an empirical assessment of common hospital readmission models. The process includes selecting bias measures, interpretation, determining disparity impact and potential mitigations. Methods: This retrospective analysis evaluated racial bias of four common models predicting 30-day unplanned readmission (i.e., LACE Index, HOSPITAL Score, and the CMS readmission measure applied as is and retrained). The models were assessed using 2.4 million adult inpatient discharges in Maryland from 2016 to 2019. Fairness metrics that are model-agnostic, easy to compute, and interpretable were implemented and apprised to select the most appropriate bias measures. The impact of changing model's risk thresholds on these measures was further assessed to guide the selection of optimal thresholds to control and mitigate bias. Results: Four bias measures were selected for the predictive task: zero-one-loss difference, false negative rate (FNR) parity, false positive rate (FPR) parity, and generalized entropy index. Based on these measures, the HOSPITAL score and the retrained CMS measure demonstrated the lowest racial bias. White patients showed a higher FNR while Black patients resulted in a higher FPR and zero-one-loss. As the models’ risk threshold changed, trade-offs between models’ fairness and overall performance were observed, and the assessment showed all models’ default thresholds were reasonable for balancing accuracy and bias. Conclusions: This study proposes an Applied Framework to Assess Fairness of Predictive Models (AFAFPM) and demonstrates the process using 30-day hospital readmission model as the example. It suggests the feasibility of applying algorithmic bias assessment to determine optimized risk thresholds so that predictive models can be used more equitably and accurately. It is evident that a combination of qualitative and quantitative methods and a multidisciplinary team are necessary to identify, understand and respond to algorithm bias in real-world healthcare settings. Users should also apply multiple bias measures to ensure a more comprehensive, tailored, and balanced view. The results of bias measures, however, must be interpreted with caution and consider the larger operational, clinical, and policy context.
KW - Algorithmic Bias
KW - Algorithmic Fairness
KW - Health Disparity
KW - Hospital Readmission
KW - Population Health Management
KW - Predictive Models
UR - http://www.scopus.com/inward/record.url?scp=85197290603&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197290603&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2024.104683
DO - 10.1016/j.jbi.2024.104683
M3 - Article
C2 - 38925281
AN - SCOPUS:85197290603
SN - 1532-0464
VL - 156
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
M1 - 104683
ER -