AFuNet: an attention-based fusion network to classify texts in a resource-constrained language

Md Rajib Hossain, Mohammed Moshiul Hoque, M. Ali Akber Dewan, Enamul Hoque, Nazmul Siddique

Research output: Contribution to journalJournal Articlepeer-review

Abstract

In the era of widespread Internet use and extensive social media interaction, the digital realm is accumulating vast amounts of unstructured text data. This unstructured data often contain undesirable information, necessitating time-consuming manual classification efforts. An intelligent text classification system capable of automatically categorizing digitized texts based on semantic meaning is crucial. However, this task is particularly challenging for low-resource languages like Bengali due to a shortage of annotated corpora, issues with out-of-vocabulary words, lack of domain-specific hyperparameter tuning, limited ability to extract generalized text features, and class imbalances within the corpus. AFuNet: an attention-based fusion network to classify texts in a resource-constrained language. AFuNet undergoes a comprehensive four-phase experimental process, including baseline model evaluation and hyperparameter tuning, late fusion and model selection, attention-based early fusion and model identification, and an ablation study with impact analysis. Fine-tuned based on five Bengali text classification corpora, AFuNet achieves impressive accuracies: 96.60 ± 0.2 (BTCC11), 85.37 ± 0.2 (OSBC), 97.35 ± 0.2 (BARD), 93.74 ± 0.2 (IndicNLP), and 96.51 ± 0.2 (ProthomAlo). In comparison with previous state-of-the-art models on these corpora, AFuNet demonstrates significant accuracy improvements ranging from 0.54% to 4.49%, showcasing its effectiveness in advancing text classification capabilities for the Bengali language.

Original languageEnglish
Article number110182
Pages (from-to)6725-6748
Number of pages24
JournalNeural Computing and Applications
Volume37
Issue number9
DOIs
Publication statusPublished - Mar. 2025

Keywords

  • Early fusion
  • Low-resource languages
  • Multi-head attention
  • Natural language processing
  • Text classification
  • Transformer-based learning

Fingerprint

Dive into the research topics of 'AFuNet: an attention-based fusion network to classify texts in a resource-constrained language'. Together they form a unique fingerprint.

Cite this