TY - JOUR
T1 - An Entity Extraction and Categorization Technique on Twitter Streams
AU - Narayanasamy, Senthil Kumar
AU - Chang, Maiga
N1 - Publisher Copyright:
© 2024 World Scientific Publishing Company.
PY - 2024/5/1
Y1 - 2024/5/1
N2 - As social media platforms have gained huge momentum in recent years, the amount of information generated from the social media sites is growing exponentially and gives the information retrieval systems a great challenge to extract the potential named entities. Researchers have utilized the semantic annotation mechanism to retrieve the entities from the unstructured documents, but the mechanism returns with too many ambiguous entities. In this work, the DBpedia knowledge base is adopted for entity extraction and categorization. To achieve the entity extraction task precisely, a two-step process is proposed: (a) train the unstructured datasets with Word2Vec and classify the entities into their respective categories. (b) crawl the web pages, forums, and other web sources to identifying the entities that are not present in the DBpedia. The evaluation shows the results with more precision and promising F1 score.
AB - As social media platforms have gained huge momentum in recent years, the amount of information generated from the social media sites is growing exponentially and gives the information retrieval systems a great challenge to extract the potential named entities. Researchers have utilized the semantic annotation mechanism to retrieve the entities from the unstructured documents, but the mechanism returns with too many ambiguous entities. In this work, the DBpedia knowledge base is adopted for entity extraction and categorization. To achieve the entity extraction task precisely, a two-step process is proposed: (a) train the unstructured datasets with Word2Vec and classify the entities into their respective categories. (b) crawl the web pages, forums, and other web sources to identifying the entities that are not present in the DBpedia. The evaluation shows the results with more precision and promising F1 score.
KW - DBpedia
KW - LDA
KW - Named entity recognition
KW - Tweets
KW - Word2Vec
KW - knowledge base
UR - http://www.scopus.com/inward/record.url?scp=85158014471&partnerID=8YFLogxK
U2 - 10.1142/S0219622023500360
DO - 10.1142/S0219622023500360
M3 - Journal Article
AN - SCOPUS:85158014471
SN - 0219-6220
VL - 23
SP - 1203
EP - 1228
JO - International Journal of Information Technology and Decision Making
JF - International Journal of Information Technology and Decision Making
IS - 3
ER -