A semantic cover approach for topic modeling

Rajagopal Venkatesaramani, Douglas Downey, Bradley Malin, Yevgeniy Vorobeychik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

We introduce a novel topic modeling approach based on constructing a semantic set cover for clusters of similar documents. Specifically, our approach first clusters documents using their Tf-Idf representation, and then covers each cluster with a set of topic words based on semantic similarity, defined in terms of a word embedding. Computing a topic cover amounts to solving a minimum set cover problem. Our evaluation compares our topic modeling approach to Latent Dirichlet Allocation (LDA) on three metrics: 1) qualitative topic match, measured using evaluations by Amazon Mechanical Turk (MTurk) workers, 2) performance on classification tasks using each topic model as a sparse feature representation, and 3) topic coherence. We find that qualitative judgments significantly favor our approach, the method outperforms LDA on topic coherence, and is comparable to LDA on document classification tasks.

Original languageEnglish (US)
Title of host publication*SEM@NAACL-HLT 2019 - 8th Joint Conference on Lexical and Computational Semantics
PublisherAssociation for Computational Linguistics (ACL)
Pages92-102
Number of pages11
ISBN (Electronic)9781948087933
StatePublished - 2019
Event8th Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2019 - Minneapolis, United States
Duration: Jun 6 2019Jun 7 2019

Publication series

Name*SEM@NAACL-HLT 2019 - 8th Joint Conference on Lexical and Computational Semantics

Conference

Conference8th Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2019
Country/TerritoryUnited States
CityMinneapolis
Period6/6/196/7/19

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'A semantic cover approach for topic modeling'. Together they form a unique fingerprint.

Cite this