Semantic similarity-enhanced topic models for document analysis

Yan Gao, Dunwei Wen

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

3 Citations (Scopus)


In e-learning environment, more and more larger-scale text resources are generated by teaching–learning interactions. Finding latent topics in these resources can help us understand the teaching contents and the learners’ interests and focuses. Latent Dirichlet allocation (LDA) has been widely used in many areas to extract the latent topics in a text corpus. However, the extracted topics cannot be understood by the end user. Adding more auxiliary information to LDA to guide the process of topic extraction is a good way to improve the interpretability of topic modeling. Co-occurrence information in corpus is such information, but it is not sufficient yet to measure the similarity between word pairs, especially in sparse document space. To deal with this problem, we propose a new semantic similarity-enhanced topic model in this paper. In this model, we use not only co-occurrence information but also the semantic similarity based on WordNet as auxiliary information. Those two kinds of information are combined into a topic-word component though generative Pólya urn model. The distribution of documents over the extracted topics obtained by the new model can be inputted to the classifier. The accuracy of extracting topics can improve the performance of the classifier. Our experiments on newsgroup corpus show that the semantic similarity-enhanced topic model performs better than the topic models with only single information separately.

Original languageEnglish
Title of host publicationLecture Notes in Educational Technology
Number of pages12
Publication statusPublished - 2015

Publication series

NameLecture Notes in Educational Technology
ISSN (Print)2196-4963
ISSN (Electronic)2196-4971


  • Generative pólya urn model
  • Gibbs sampling
  • LDA
  • Semantic similarity
  • Topic modeling
  • WordNet


Dive into the research topics of 'Semantic similarity-enhanced topic models for document analysis'. Together they form a unique fingerprint.

Cite this