Feature engineering and machine learning for causality assessment in pharmacovigilance: Lessons learned from application to the FDA Adverse Event Reporting System

Kory Kreimeyer; Oanh Dang; Jonathan Spiker; Monica A. Muñoz; Gary Rosner; Robert Ball; Taxiarchis Botsis

doi:10.1016/j.compbiomed.2021.104517

Feature engineering and machine learning for causality assessment in pharmacovigilance: Lessons learned from application to the FDA Adverse Event Reporting System

Kory Kreimeyer, Oanh Dang, Jonathan Spiker, Monica A. Muñoz, Gary Rosner, Robert Ball, Taxiarchis Botsis

School of Medicine

Research output: Contribution to journal › Article › peer-review

Abstract

Background: Our objective was to support the automated classification of Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) reports for their usefulness in assessing the possibility of a causal relationship between a drug product and an adverse event. Method: We used a data set of 326 redacted FAERS reports that was previously annotated using a modified version of the World Health Organization–Uppsala Monitoring Centre criteria for drug causality assessment by a group of SEs at the FDA and supported a similar study on the classification of reports using supervised machine learning and text engineering methods. We explored many potential features, including the incorporation of natural language processing on report text and information from external data sources, for supervised learning and developed models for predicting the classification status of reports. We then evaluated the models on a larger data set of previously unseen reports. Results: The best-performing models achieved recall and F1 scores on both data sets above 0.80 for the identification of assessable reports (i.e. those containing enough information to make an informed causality assessment) and above 0.75 for the identification of reports meeting at least a Possible causality threshold. Conclusions: Causal inference from FAERS reports depends on many components with complex logical relationships that are yet to be made fully computable. Efforts focused on readily addressable tasks, such as quickly eliminating unassessable reports, fit naturally in SE's thought processes to provide real enhancements for FDA workflows.

Original language	English (US)
Article number	104517
Journal	Computers in Biology and Medicine
Volume	135
DOIs	https://doi.org/10.1016/j.compbiomed.2021.104517
State	Published - Aug 2021

Keywords

Case classification
Causality assessment
Clinical natural language processing
Decision support
Pharmacovigilance

ASJC Scopus subject areas

Computer Science Applications
Health Informatics

Access to Document

10.1016/j.compbiomed.2021.104517

Cite this

@article{bceff3eb495546be94c351ade006b1eb,

title = "Feature engineering and machine learning for causality assessment in pharmacovigilance: Lessons learned from application to the FDA Adverse Event Reporting System",

abstract = "Background: Our objective was to support the automated classification of Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) reports for their usefulness in assessing the possibility of a causal relationship between a drug product and an adverse event. Method: We used a data set of 326 redacted FAERS reports that was previously annotated using a modified version of the World Health Organization–Uppsala Monitoring Centre criteria for drug causality assessment by a group of SEs at the FDA and supported a similar study on the classification of reports using supervised machine learning and text engineering methods. We explored many potential features, including the incorporation of natural language processing on report text and information from external data sources, for supervised learning and developed models for predicting the classification status of reports. We then evaluated the models on a larger data set of previously unseen reports. Results: The best-performing models achieved recall and F1 scores on both data sets above 0.80 for the identification of assessable reports (i.e. those containing enough information to make an informed causality assessment) and above 0.75 for the identification of reports meeting at least a Possible causality threshold. Conclusions: Causal inference from FAERS reports depends on many components with complex logical relationships that are yet to be made fully computable. Efforts focused on readily addressable tasks, such as quickly eliminating unassessable reports, fit naturally in SE's thought processes to provide real enhancements for FDA workflows.",

keywords = "Case classification, Causality assessment, Clinical natural language processing, Decision support, Pharmacovigilance",

author = "Kory Kreimeyer and Oanh Dang and Jonathan Spiker and Mu{\~n}oz, {Monica A.} and Gary Rosner and Robert Ball and Taxiarchis Botsis",

note = "Funding Information: This work was supported by the FDA's Broad Agency Announcement (BAA) mechanism (Contract Number: 75F40119C10084 ) and, partially, by a Center of Excellence in Regulatory Science and Innovation (CERSI) grant to Johns Hopkins University from the US Food & Drug Administration (Grant Number: 2U01FD005942-03 REVISED). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Department of Health and Human Services or the Food and Drug Administration. Publisher Copyright: {\textcopyright} 2021 Elsevier Ltd",

year = "2021",

month = aug,

doi = "10.1016/j.compbiomed.2021.104517",

language = "English (US)",

volume = "135",

journal = "Computers in Biology and Medicine",

issn = "0010-4825",

publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Feature engineering and machine learning for causality assessment in pharmacovigilance

T2 - Lessons learned from application to the FDA Adverse Event Reporting System

AU - Kreimeyer, Kory

AU - Dang, Oanh

AU - Spiker, Jonathan

AU - Muñoz, Monica A.

AU - Rosner, Gary

AU - Ball, Robert

AU - Botsis, Taxiarchis

N1 - Funding Information: This work was supported by the FDA's Broad Agency Announcement (BAA) mechanism (Contract Number: 75F40119C10084 ) and, partially, by a Center of Excellence in Regulatory Science and Innovation (CERSI) grant to Johns Hopkins University from the US Food & Drug Administration (Grant Number: 2U01FD005942-03 REVISED). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Department of Health and Human Services or the Food and Drug Administration. Publisher Copyright: © 2021 Elsevier Ltd

PY - 2021/8

Y1 - 2021/8

N2 - Background: Our objective was to support the automated classification of Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) reports for their usefulness in assessing the possibility of a causal relationship between a drug product and an adverse event. Method: We used a data set of 326 redacted FAERS reports that was previously annotated using a modified version of the World Health Organization–Uppsala Monitoring Centre criteria for drug causality assessment by a group of SEs at the FDA and supported a similar study on the classification of reports using supervised machine learning and text engineering methods. We explored many potential features, including the incorporation of natural language processing on report text and information from external data sources, for supervised learning and developed models for predicting the classification status of reports. We then evaluated the models on a larger data set of previously unseen reports. Results: The best-performing models achieved recall and F1 scores on both data sets above 0.80 for the identification of assessable reports (i.e. those containing enough information to make an informed causality assessment) and above 0.75 for the identification of reports meeting at least a Possible causality threshold. Conclusions: Causal inference from FAERS reports depends on many components with complex logical relationships that are yet to be made fully computable. Efforts focused on readily addressable tasks, such as quickly eliminating unassessable reports, fit naturally in SE's thought processes to provide real enhancements for FDA workflows.

AB - Background: Our objective was to support the automated classification of Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) reports for their usefulness in assessing the possibility of a causal relationship between a drug product and an adverse event. Method: We used a data set of 326 redacted FAERS reports that was previously annotated using a modified version of the World Health Organization–Uppsala Monitoring Centre criteria for drug causality assessment by a group of SEs at the FDA and supported a similar study on the classification of reports using supervised machine learning and text engineering methods. We explored many potential features, including the incorporation of natural language processing on report text and information from external data sources, for supervised learning and developed models for predicting the classification status of reports. We then evaluated the models on a larger data set of previously unseen reports. Results: The best-performing models achieved recall and F1 scores on both data sets above 0.80 for the identification of assessable reports (i.e. those containing enough information to make an informed causality assessment) and above 0.75 for the identification of reports meeting at least a Possible causality threshold. Conclusions: Causal inference from FAERS reports depends on many components with complex logical relationships that are yet to be made fully computable. Efforts focused on readily addressable tasks, such as quickly eliminating unassessable reports, fit naturally in SE's thought processes to provide real enhancements for FDA workflows.

KW - Case classification

KW - Causality assessment

KW - Clinical natural language processing

KW - Decision support

KW - Pharmacovigilance

UR - http://www.scopus.com/inward/record.url?scp=85107814982&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85107814982&partnerID=8YFLogxK

U2 - 10.1016/j.compbiomed.2021.104517

DO - 10.1016/j.compbiomed.2021.104517

M3 - Article

C2 - 34130003

AN - SCOPUS:85107814982

SN - 0010-4825

VL - 135

JO - Computers in Biology and Medicine

JF - Computers in Biology and Medicine

M1 - 104517

ER -

Feature engineering and machine learning for causality assessment in pharmacovigilance: Lessons learned from application to the FDA Adverse Event Reporting System

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this