OOV sensitive Named-Entity Recognition in speech

Carolina Parada, Mark Dredze, Frederick Jelinek

Research output: Contribution to journalConference articlepeer-review

Abstract

Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named entities and always produce transcription errors. In this work, we improve speech NER by including features indicative of OOVs based on a OOV detector, allowing for the identification of regions of speech containing named entities, even if they are incorrectly transcribed. We construct a new speech NER data set and demonstrate significant improvements for this task.

Original languageEnglish (US)
Pages (from-to)2085-2088
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2011
Externally publishedYes
Event12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy
Duration: Aug 27 2011Aug 31 2011

Keywords

  • Named Entity Recognition
  • OOV Detection

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'OOV sensitive Named-Entity Recognition in speech'. Together they form a unique fingerprint.

Cite this