Evaluation of Natural Language Processing (NLP) systems to annotate drug product labeling with MedDRA terminology

Thomas Ly, Carol Pamer, Oanh Dang, Sonja Brajovic, Shahrukh Haider, Taxiarchis Botsis, David Milward, Andrew Winter, Susan Lu, Robert Ball

Research output: Contribution to journalComment/debatepeer-review

14 Scopus citations


Introduction: The FDA Adverse Event Reporting System (FAERS) is a primary data source for identifying unlabeled adverse events (AEs) in a drug or biologic drug product's postmarketing phase. Many AE reports must be reviewed by drug safety experts to identify unlabeled AEs, even if the reported AEs are previously identified, labeled AEs. Integrating the labeling status of drug product AEs into FAERS could increase report triage and review efficiency. Medical Dictionary for Regulatory Activities (MedDRA) is the standard for coding AE terms in FAERS cases. However, drug manufacturers are not required to use MedDRA to describe AEs in product labels. We hypothesized that natural language processing (NLP) tools could assist in automating the extraction and MedDRA mapping of AE terms in drug product labels. Materials and methods: We evaluated the performance of three NLP systems, (ETHER, I2E, MetaMap) for their ability to extract AE terms from drug labels and translate the terms to MedDRA Preferred Terms (PTs). Pharmacovigilance-based annotation guidelines for extracting AE terms from drug labels were developed for this study. We compared each system's output to MedDRA PT AE lists, manually mapped by FDA pharmacovigilance experts using the guidelines, for ten drug product labels known as the “gold standard AE list” (GSL) dataset. Strict time and configuration conditions were imposed in order to test each system's capabilities under conditions of no human intervention and minimal system configuration. Each NLP system's output was evaluated for precision, recall and F measure in comparison to the GSL. A qualitative error analysis (QEA) was conducted to categorize a random sample of each NLP system's false positive and false negative errors. Results: A total of 417, 278, and 250 false positive errors occurred in the ETHER, I2E, and MetaMap outputs, respectively. A total of 100, 80, and 187 false negative errors occurred in ETHER, I2E, and MetaMap outputs, respectively. Precision ranged from 64% to 77%, recall from 64% to 83% and F measure from 67% to 79%. I2E had the highest precision (77%), recall (83%) and F measure (79%). ETHER had the lowest precision (64%). MetaMap had the lowest recall (64%). The QEA found that the most prevalent false positive errors were context errors such as “Context error/General term” “Context error/Instructions or monitoring parameters” “Context error/Medical history preexisting condition underlying condition risk factor or contraindication” and “Context error/AE manifestations or secondary complication”. The most prevalent false negative errors were in the “Incomplete or missed extraction” error category. Missing AE terms were typically due to long terms, or terms containing non-contiguous words which do not correspond exactly to MedDRA synonyms. MedDRA mapping errors were a minority of errors for ETHER and I2E but were the most prevalent false positive errors for MetaMap. Conclusions: The results demonstrate that it may be feasible to use NLP tools to extract and map AE terms to MedDRA PTs. However, the NLP tools we tested would need to be modified or reconfigured to lower the error rates to support their use in a regulatory setting. Tools specific for extracting AE terms from drug labels and mapping the terms to MedDRA PTs may need to be developed to support pharmacovigilance. Conducting research using additional NLP systems on a larger, diverse GSL would also be informative.

Original languageEnglish (US)
Pages (from-to)73-86
Number of pages14
JournalJournal of Biomedical Informatics
StatePublished - Jul 2018


  • Drug Safety
  • FDA
  • Labeling
  • MedDRA
  • Pharmacovigilance

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics


Dive into the research topics of 'Evaluation of Natural Language Processing (NLP) systems to annotate drug product labeling with MedDRA terminology'. Together they form a unique fingerprint.

Cite this