Semantic similarity-enhanced topic models for document analysis

Yan Gao, Dunwei Wen

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

3 Citations (Scopus)

Abstract

In e-learning environment, more and more larger-scale text resources are generated by teaching–learning interactions. Finding latent topics in these resources can help us understand the teaching contents and the learners’ interests and focuses. Latent Dirichlet allocation (LDA) has been widely used in many areas to extract the latent topics in a text corpus. However, the extracted topics cannot be understood by the end user. Adding more auxiliary information to LDA to guide the process of topic extraction is a good way to improve the interpretability of topic modeling. Co-occurrence information in corpus is such information, but it is not sufficient yet to measure the similarity between word pairs, especially in sparse document space. To deal with this problem, we propose a new semantic similarity-enhanced topic model in this paper. In this model, we use not only co-occurrence information but also the semantic similarity based on WordNet as auxiliary information. Those two kinds of information are combined into a topic-word component though generative Pólya urn model. The distribution of documents over the extracted topics obtained by the new model can be inputted to the classifier. The accuracy of extracting topics can improve the performance of the classifier. Our experiments on newsgroup corpus show that the semantic similarity-enhanced topic model performs better than the topic models with only single information separately.

Original languageEnglish
Title of host publicationLecture Notes in Educational Technology
Pages45-56
Number of pages12
Edition9783662444467
DOIs
Publication statusPublished - 2015

Publication series

NameLecture Notes in Educational Technology
Number9783662444467
ISSN (Print)2196-4963
ISSN (Electronic)2196-4971

Keywords

  • Generative pólya urn model
  • Gibbs sampling
  • LDA
  • Semantic similarity
  • Topic modeling
  • WordNet

Fingerprint

Dive into the research topics of 'Semantic similarity-enhanced topic models for document analysis'. Together they form a unique fingerprint.

Cite this