Semantic similarity-enhanced topic models for document analysis

Yan Gao, Dunwei Wen

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

    3 Citations (Scopus)

    Abstract

    In e-learning environment, more and more larger-scale text resources are generated by teaching–learning interactions. Finding latent topics in these resources can help us understand the teaching contents and the learners’ interests and focuses. Latent Dirichlet allocation (LDA) has been widely used in many areas to extract the latent topics in a text corpus. However, the extracted topics cannot be understood by the end user. Adding more auxiliary information to LDA to guide the process of topic extraction is a good way to improve the interpretability of topic modeling. Co-occurrence information in corpus is such information, but it is not sufficient yet to measure the similarity between word pairs, especially in sparse document space. To deal with this problem, we propose a new semantic similarity-enhanced topic model in this paper. In this model, we use not only co-occurrence information but also the semantic similarity based on WordNet as auxiliary information. Those two kinds of information are combined into a topic-word component though generative Pólya urn model. The distribution of documents over the extracted topics obtained by the new model can be inputted to the classifier. The accuracy of extracting topics can improve the performance of the classifier. Our experiments on newsgroup corpus show that the semantic similarity-enhanced topic model performs better than the topic models with only single information separately.

    Original languageEnglish
    Title of host publicationLecture Notes in Educational Technology
    Pages45-56
    Number of pages12
    Edition9783662444467
    DOIs
    Publication statusPublished - 2015

    Publication series

    NameLecture Notes in Educational Technology
    Number9783662444467
    ISSN (Print)2196-4963
    ISSN (Electronic)2196-4971

    Keywords

    • Generative pólya urn model
    • Gibbs sampling
    • LDA
    • Semantic similarity
    • Topic modeling
    • WordNet

    Fingerprint

    Dive into the research topics of 'Semantic similarity-enhanced topic models for document analysis'. Together they form a unique fingerprint.

    Cite this