Bayesian modeling of lexical resources for low-resource settings

Nicholas Andrews, Mark Dredze, Benjamin Van Durme, Jason Eisner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Lexical resources such as dictionaries and gazetteers are often used as auxiliary data for tasks such as part-of-speech induction and named-entity recognition. However, discriminative training with lexical features requires annotated data to reliably estimate the lexical feature weights and may result in overfitting the lexical features at the expense of features which generalize better. In this paper, we investigate a more robust approach: we stipulate that the lexicon is the result of an assumed generative process. Practically, this means that we may treat the lexical resources as observations under the proposed generative model. The lexical resources provide training data for the generative model without requiring separate data to estimate lexical feature weights. We evaluate the proposed approach in two settings: part-of-speech induction and low-resource named-entity recognition.

Original languageEnglish (US)
Title of host publicationACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
PublisherAssociation for Computational Linguistics (ACL)
Pages1029-1039
Number of pages11
ISBN (Electronic)9781945626753
DOIs
StatePublished - 2017
Event55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 - Vancouver, Canada
Duration: Jul 30 2017Aug 4 2017

Publication series

NameACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Volume1

Conference

Conference55th Annual Meeting of the Association for Computational Linguistics, ACL 2017
Country/TerritoryCanada
CityVancouver
Period7/30/178/4/17

ASJC Scopus subject areas

  • Language and Linguistics
  • Artificial Intelligence
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Bayesian modeling of lexical resources for low-resource settings'. Together they form a unique fingerprint.

Cite this