Streaming cross document entity coreference resolution

Delip Rao, Paul McNamee, Mark Dredze

Research output: Contribution to conferencePaperpeer-review

Abstract

Previous research in cross-document entity coreference has generally been restricted to the offline scenario where the set of documents is provided in advance. As a consequence, the dominant approach is based on greedy agglomerative clustering techniques that utilize pairwise vector comparisons and thus require O(n2) space and time. In this paper we explore identifying coreferent entity mentions across documents in high-volume streaming text, including methods for utilizing orthographic and contextual information. We test our methods using several corpora to quantitatively measure both the efficacy and scalability of our streaming approach. We show that our approach scales to at least an order of magnitude larger data than previous reported methods.

Original languageEnglish (US)
Pages1050-1058
Number of pages9
StatePublished - 2010
Externally publishedYes
Event23rd International Conference on Computational Linguistics, Coling 2010 - Beijing, China
Duration: Aug 23 2010Aug 27 2010

Conference

Conference23rd International Conference on Computational Linguistics, Coling 2010
Country/TerritoryChina
CityBeijing
Period8/23/108/27/10

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Streaming cross document entity coreference resolution'. Together they form a unique fingerprint.

Cite this