Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review

Kory Kreimeyer, Matthew Foster, Abhishek Pandey, Nina Arya, Gwendolyn Halford, Sandra F. Jones, Richard Forshee, Mark Walderhaug, Taxiarchis Botsis

Research output: Contribution to journalReview articlepeer-review

128 Scopus citations


We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP.

Original languageEnglish (US)
Pages (from-to)14-29
Number of pages16
JournalJournal of Biomedical Informatics
StatePublished - Sep 2017
Externally publishedYes


  • Common data elements
  • Natural language processing
  • Review
  • Systematic

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics


Dive into the research topics of 'Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review'. Together they form a unique fingerprint.

Cite this