An Evaluation of Pretrained BERT Models for Comparing Semantic Similarity Across Unstructured Clinical Trial Texts

Jessica Patricoski, Kory Kreimeyer, Archana Balan, Kent Hardart, Jessica Tao, Valsamo Anagnostou, Taxiarchis Botsis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Processing unstructured clinical texts is often necessary to support certain tasks in biomedicine, such as matching patients to clinical trials. Among other methods, domain-specific language models have been built to utilize free-text information. This study evaluated the performance of Bidirectional Encoder Representations from Transformers (BERT) models in assessing the similarity between clinical trial texts. We compared an unstructured aggregated summary of clinical trials reviewed at the Johns Hopkins Molecular Tumor Board with the ClinicalTrials.gov records, focusing on the titles and eligibility criteria. Seven pretrained BERT-Based models were used in our analysis. Of the six biomedical-domain-specific models, only SciBERT outperformed the original BERT model by accurately assigning higher similarity scores to matched than mismatched trials. This finding is promising and shows that BERT and, likely, other language models may support patient-trial matching.

Original languageEnglish (US)
Title of host publicationInformatics and Technology in Clinical Care and Public Health
EditorsJohn Mantas, Arie Hasman, Mowafa S. Househ, Parisis Gallos, Emmanouil Zoulias, Joseph Liasko
PublisherIOS Press BV
Pages18-21
Number of pages4
ISBN (Electronic)9781643682501
DOIs
StatePublished - 2022

Publication series

NameStudies in Health Technology and Informatics
Volume289
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365

Keywords

  • Clinical trial
  • bidirectional coder representations
  • word embeddings

ASJC Scopus subject areas

  • Health Information Management
  • Health Informatics
  • Biomedical Engineering

Fingerprint

Dive into the research topics of 'An Evaluation of Pretrained BERT Models for Comparing Semantic Similarity Across Unstructured Clinical Trial Texts'. Together they form a unique fingerprint.

Cite this