TY - GEN
T1 - A semantic cover approach for topic modeling
AU - Venkatesaramani, Rajagopal
AU - Downey, Douglas
AU - Malin, Bradley
AU - Vorobeychik, Yevgeniy
N1 - Publisher Copyright:
© 2019 Association for Computational Linguistics
PY - 2019
Y1 - 2019
N2 - We introduce a novel topic modeling approach based on constructing a semantic set cover for clusters of similar documents. Specifically, our approach first clusters documents using their Tf-Idf representation, and then covers each cluster with a set of topic words based on semantic similarity, defined in terms of a word embedding. Computing a topic cover amounts to solving a minimum set cover problem. Our evaluation compares our topic modeling approach to Latent Dirichlet Allocation (LDA) on three metrics: 1) qualitative topic match, measured using evaluations by Amazon Mechanical Turk (MTurk) workers, 2) performance on classification tasks using each topic model as a sparse feature representation, and 3) topic coherence. We find that qualitative judgments significantly favor our approach, the method outperforms LDA on topic coherence, and is comparable to LDA on document classification tasks.
AB - We introduce a novel topic modeling approach based on constructing a semantic set cover for clusters of similar documents. Specifically, our approach first clusters documents using their Tf-Idf representation, and then covers each cluster with a set of topic words based on semantic similarity, defined in terms of a word embedding. Computing a topic cover amounts to solving a minimum set cover problem. Our evaluation compares our topic modeling approach to Latent Dirichlet Allocation (LDA) on three metrics: 1) qualitative topic match, measured using evaluations by Amazon Mechanical Turk (MTurk) workers, 2) performance on classification tasks using each topic model as a sparse feature representation, and 3) topic coherence. We find that qualitative judgments significantly favor our approach, the method outperforms LDA on topic coherence, and is comparable to LDA on document classification tasks.
UR - http://www.scopus.com/inward/record.url?scp=85091730555&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091730555&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85091730555
T3 - *SEM@NAACL-HLT 2019 - 8th Joint Conference on Lexical and Computational Semantics
SP - 92
EP - 102
BT - *SEM@NAACL-HLT 2019 - 8th Joint Conference on Lexical and Computational Semantics
PB - Association for Computational Linguistics (ACL)
T2 - 8th Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2019
Y2 - 6 June 2019 through 7 June 2019
ER -