TY - JOUR
T1 - Application of information retrieval approaches to case classification in the vaccine adverse event reporting system
AU - Botsis, Taxiarchis
AU - Woo, Emily Jane
AU - Ball, Robert
N1 - Funding Information:
The classification of cases based on the information extracted from within a large collection of documents lies in the field of information retrieval (IR) and deals mostly with unstructured (free text) information [2]. While research to apply natural language processing (NLP) techniques is underway to extract the appropriate information from the free text and support the text classification, i.e. the categorization of the topic or theme of a document [3], currently in VAERS and most other SRS databases, the encoding of the textual and other information in the reports is undertaken manually. This process is supported by medical terminology, namely the Medical Dictionary for Regulatory Activities (MedDRA®)1 [4]. Current practice uses MedDRA® Preferred Terms (PTs) and Standardized MedDRA® Queries (SMQs) to identify cases of interest, followed by the application of a case definition, such as the Brighton Collaboration (BC) case definitions [5], in preparation for a case-series evaluation.
PY - 2013/7
Y1 - 2013/7
N2 - Background: Automating the classification of adverse event reports is an important step to improve the efficiency of vaccine safety surveillance. Previously we showed it was possible to classify reports using features extracted from the text of the reports. Objective: The aim of this study was to use the information encoded in the Medical Dictionary for Regulatory Activities (MedDRA®) in the US Vaccine Adverse Event Reporting System (VAERS) to support and evaluate two classification approaches: a multiple information retrieval strategy and a rule-based approach. To evaluate the performance of these approaches, we selected the conditions of anaphylaxis and Guillain-Barré syndrome (GBS). Methods: We used MedDRA® Preferred Terms stored in the VAERS, and two standardized medical terminologies: the Brighton Collaboration (BC) case definitions and Standardized MedDRA ® Queries (SMQ) to classify two sets of reports for GBS and anaphylaxis. Two approaches were used: (i) the rule-based instruments that are available by the two terminologies (the Automatic Brighton Classification [ABC] tool and the SMQ algorithms); and (ii) the vector space model. Results: We found that the rule-based instruments, particularly the SMQ algorithms, achieved a high degree of specificity; however, there was a cost in terms of sensitivity in all but the narrow GBS SMQ algorithm that outperformed the remaining approaches (sensitivity in the testing set was equal to 99.06 % for this algorithm vs. 93.40 % for the vector space model). In the case of anaphylaxis, the vector space model achieved higher sensitivity compared with the best values of both the ABC tool and the SMQ algorithms in the testing set (86.44 % vs. 64.11 % and 52.54 %, respectively). Conclusions: Our results showed the superiority of the vector space model over the existing rule-based approaches irrespective of the standardized medical knowledge represented by either the SMQ or the BC case definition. The vector space model might make automation of case definitions for spontaneous report review more efficient than current rule-based approaches, allowing more time for critical assessment and decision making by pharmacovigilance experts.
AB - Background: Automating the classification of adverse event reports is an important step to improve the efficiency of vaccine safety surveillance. Previously we showed it was possible to classify reports using features extracted from the text of the reports. Objective: The aim of this study was to use the information encoded in the Medical Dictionary for Regulatory Activities (MedDRA®) in the US Vaccine Adverse Event Reporting System (VAERS) to support and evaluate two classification approaches: a multiple information retrieval strategy and a rule-based approach. To evaluate the performance of these approaches, we selected the conditions of anaphylaxis and Guillain-Barré syndrome (GBS). Methods: We used MedDRA® Preferred Terms stored in the VAERS, and two standardized medical terminologies: the Brighton Collaboration (BC) case definitions and Standardized MedDRA ® Queries (SMQ) to classify two sets of reports for GBS and anaphylaxis. Two approaches were used: (i) the rule-based instruments that are available by the two terminologies (the Automatic Brighton Classification [ABC] tool and the SMQ algorithms); and (ii) the vector space model. Results: We found that the rule-based instruments, particularly the SMQ algorithms, achieved a high degree of specificity; however, there was a cost in terms of sensitivity in all but the narrow GBS SMQ algorithm that outperformed the remaining approaches (sensitivity in the testing set was equal to 99.06 % for this algorithm vs. 93.40 % for the vector space model). In the case of anaphylaxis, the vector space model achieved higher sensitivity compared with the best values of both the ABC tool and the SMQ algorithms in the testing set (86.44 % vs. 64.11 % and 52.54 %, respectively). Conclusions: Our results showed the superiority of the vector space model over the existing rule-based approaches irrespective of the standardized medical knowledge represented by either the SMQ or the BC case definition. The vector space model might make automation of case definitions for spontaneous report review more efficient than current rule-based approaches, allowing more time for critical assessment and decision making by pharmacovigilance experts.
UR - http://www.scopus.com/inward/record.url?scp=84879973537&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84879973537&partnerID=8YFLogxK
U2 - 10.1007/s40264-013-0064-4
DO - 10.1007/s40264-013-0064-4
M3 - Article
C2 - 23703591
AN - SCOPUS:84879973537
SN - 0114-5916
VL - 36
SP - 573
EP - 582
JO - Drug Safety
JF - Drug Safety
IS - 7
ER -