Shared components topic models

Matthew R. Gormley, Mark Dredze, Benjamin Van Durme, Jason Eisner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With a few exceptions, extensions to latent Dirichlet allocation (LDA) have focused on the distribution over topics for each document. Much less attention has been given to the underlying structure of the topics themselves. As a result, most topic models generate topics independently from a single underlying distribution and require millions of parameters, in the form of multinomial distributions over the vocabulary. In this paper, we introduce the Shared Components Topic Model (SCTM), in which each topic is a normalized product of a smaller number of underlying component distributions. Our model learns these component distributions and the structure of how to combine subsets of them into topics. The SCTM can represent topics in a much more compact representation than LDA and achieves better perplexity with fewer parameters.

Original languageEnglish (US)
Title of host publicationProceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies
PublisherAssociation for Computational Linguistics (ACL)
Pages783-792
Number of pages10
ISBN (Electronic)1937284204, 9781937284206
StatePublished - 2012
Externally publishedYes
Event2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2012 - Montreal, Canada
Duration: Jun 3 2012Jun 8 2012

Publication series

NameNAACL HLT 2012 - 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Conference

Conference2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2012
Country/TerritoryCanada
CityMontreal
Period6/3/126/8/12

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Shared components topic models'. Together they form a unique fingerprint.

Cite this