TY - GEN
T1 - Summary Generation Using Natural Language Processing Techniques and Cosine Similarity
AU - Pal, Sayantan
AU - Chang, Maiga
AU - Iriarte, Maria Fernandez
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - The COVID-19 pandemic has led to an unprecedented challenge to public health. It resulted in global efforts to understand, record, and alleviate the disease. This research serves the purpose of generating a relevant summary related to Coronavirus. The research uses the COVID-19 Open Research Dataset (CORD-19) provided by Allen Institute for AI. The dataset contains 236,336 academic full-text articles as of July 19, 2021. This paper introduces a web-based system to handle user questions over the Coronavirus full-text scholarly articles. The system periodically runs backend services to process such large amount article with basic Natural Language Processing (NLP) techniques that include tokenization, N-Grams extraction, and part-of-speech (PoS) tagging. It automatically identifies the keywords from the question and uses cosine similarity to summarize the associated content and present to the user. This research will possibly benefit researchers, health workers as well as other individuals. Moreover, the same service can be used to train with the datasets of different domains (e.g., education) to generate a relevant summary for other user groups (e.g., students).
AB - The COVID-19 pandemic has led to an unprecedented challenge to public health. It resulted in global efforts to understand, record, and alleviate the disease. This research serves the purpose of generating a relevant summary related to Coronavirus. The research uses the COVID-19 Open Research Dataset (CORD-19) provided by Allen Institute for AI. The dataset contains 236,336 academic full-text articles as of July 19, 2021. This paper introduces a web-based system to handle user questions over the Coronavirus full-text scholarly articles. The system periodically runs backend services to process such large amount article with basic Natural Language Processing (NLP) techniques that include tokenization, N-Grams extraction, and part-of-speech (PoS) tagging. It automatically identifies the keywords from the question and uses cosine similarity to summarize the associated content and present to the user. This research will possibly benefit researchers, health workers as well as other individuals. Moreover, the same service can be used to train with the datasets of different domains (e.g., education) to generate a relevant summary for other user groups (e.g., students).
KW - Coronavirus
KW - Information extraction
KW - N-grams
KW - Parts of speech
KW - Question and answering
UR - http://www.scopus.com/inward/record.url?scp=85127721344&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-96308-8_47
DO - 10.1007/978-3-030-96308-8_47
M3 - Published Conference contribution
AN - SCOPUS:85127721344
SN - 9783030963071
T3 - Lecture Notes in Networks and Systems
SP - 508
EP - 517
BT - Intelligent Systems Design and Applications - 21st International Conference on Intelligent Systems Design and Applications, ISDA 2021
A2 - Abraham, Ajith
A2 - Gandhi, Niketa
A2 - Hanne, Thomas
A2 - Hong, Tzung-Pei
A2 - Nogueira Rios, Tatiane
A2 - Ding, Weiping
T2 - 21st International Conference on Intelligent Systems Design and Applications, ISDA 2021
Y2 - 13 December 2021 through 15 December 2021
ER -