Summary Generation Using Natural Language Processing Techniques and Cosine Similarity

Sayantan Pal, Maiga Chang, Maria Fernandez Iriarte

Research output: Chapter in Book/Report/Conference proceedingPublished Conference contributionpeer-review

8 Citations (Scopus)

Abstract

The COVID-19 pandemic has led to an unprecedented challenge to public health. It resulted in global efforts to understand, record, and alleviate the disease. This research serves the purpose of generating a relevant summary related to Coronavirus. The research uses the COVID-19 Open Research Dataset (CORD-19) provided by Allen Institute for AI. The dataset contains 236,336 academic full-text articles as of July 19, 2021. This paper introduces a web-based system to handle user questions over the Coronavirus full-text scholarly articles. The system periodically runs backend services to process such large amount article with basic Natural Language Processing (NLP) techniques that include tokenization, N-Grams extraction, and part-of-speech (PoS) tagging. It automatically identifies the keywords from the question and uses cosine similarity to summarize the associated content and present to the user. This research will possibly benefit researchers, health workers as well as other individuals. Moreover, the same service can be used to train with the datasets of different domains (e.g., education) to generate a relevant summary for other user groups (e.g., students).

Original languageEnglish
Title of host publicationIntelligent Systems Design and Applications - 21st International Conference on Intelligent Systems Design and Applications, ISDA 2021
EditorsAjith Abraham, Niketa Gandhi, Thomas Hanne, Tzung-Pei Hong, Tatiane Nogueira Rios, Weiping Ding
Pages508-517
Number of pages10
DOIs
Publication statusPublished - 2022
Event21st International Conference on Intelligent Systems Design and Applications, ISDA 2021 - Virtual, Online
Duration: 13 Dec. 202115 Dec. 2021

Publication series

NameLecture Notes in Networks and Systems
Volume418 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference21st International Conference on Intelligent Systems Design and Applications, ISDA 2021
CityVirtual, Online
Period13/12/2115/12/21

Keywords

  • Coronavirus
  • Information extraction
  • N-grams
  • Parts of speech
  • Question and answering

Fingerprint

Dive into the research topics of 'Summary Generation Using Natural Language Processing Techniques and Cosine Similarity'. Together they form a unique fingerprint.

Cite this