An Entity Extraction and Categorization Technique on Twitter Streams

Senthil Kumar Narayanasamy, Maiga Chang

    Research output: Contribution to journalJournal Articlepeer-review

    Abstract

    As social media platforms have gained huge momentum in recent years, the amount of information generated from the social media sites is growing exponentially and gives the information retrieval systems a great challenge to extract the potential named entities. Researchers have utilized the semantic annotation mechanism to retrieve the entities from the unstructured documents, but the mechanism returns with too many ambiguous entities. In this work, the DBpedia knowledge base is adopted for entity extraction and categorization. To achieve the entity extraction task precisely, a two-step process is proposed: (a) train the unstructured datasets with Word2Vec and classify the entities into their respective categories. (b) crawl the web pages, forums, and other web sources to identifying the entities that are not present in the DBpedia. The evaluation shows the results with more precision and promising F1 score.

    Original languageEnglish
    Pages (from-to)1203-1228
    Number of pages26
    JournalInternational Journal of Information Technology and Decision Making
    Volume23
    Issue number3
    DOIs
    Publication statusPublished - 1 May 2024

    Keywords

    • DBpedia
    • LDA
    • Named entity recognition
    • Tweets
    • Word2Vec
    • knowledge base

    Fingerprint

    Dive into the research topics of 'An Entity Extraction and Categorization Technique on Twitter Streams'. Together they form a unique fingerprint.

    Cite this