TY - JOUR
T1 - AFuNet
T2 - an attention-based fusion network to classify texts in a resource-constrained language
AU - Hossain, Md Rajib
AU - Hoque, Mohammed Moshiul
AU - Dewan, M. Ali Akber
AU - Hoque, Enamul
AU - Siddique, Nazmul
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.
PY - 2025/3
Y1 - 2025/3
N2 - In the era of widespread Internet use and extensive social media interaction, the digital realm is accumulating vast amounts of unstructured text data. This unstructured data often contain undesirable information, necessitating time-consuming manual classification efforts. An intelligent text classification system capable of automatically categorizing digitized texts based on semantic meaning is crucial. However, this task is particularly challenging for low-resource languages like Bengali due to a shortage of annotated corpora, issues with out-of-vocabulary words, lack of domain-specific hyperparameter tuning, limited ability to extract generalized text features, and class imbalances within the corpus. AFuNet: an attention-based fusion network to classify texts in a resource-constrained language. AFuNet undergoes a comprehensive four-phase experimental process, including baseline model evaluation and hyperparameter tuning, late fusion and model selection, attention-based early fusion and model identification, and an ablation study with impact analysis. Fine-tuned based on five Bengali text classification corpora, AFuNet achieves impressive accuracies: 96.60 ± 0.2 (BTCC11), 85.37 ± 0.2 (OSBC), 97.35 ± 0.2 (BARD), 93.74 ± 0.2 (IndicNLP), and 96.51 ± 0.2 (ProthomAlo). In comparison with previous state-of-the-art models on these corpora, AFuNet demonstrates significant accuracy improvements ranging from 0.54% to 4.49%, showcasing its effectiveness in advancing text classification capabilities for the Bengali language.
AB - In the era of widespread Internet use and extensive social media interaction, the digital realm is accumulating vast amounts of unstructured text data. This unstructured data often contain undesirable information, necessitating time-consuming manual classification efforts. An intelligent text classification system capable of automatically categorizing digitized texts based on semantic meaning is crucial. However, this task is particularly challenging for low-resource languages like Bengali due to a shortage of annotated corpora, issues with out-of-vocabulary words, lack of domain-specific hyperparameter tuning, limited ability to extract generalized text features, and class imbalances within the corpus. AFuNet: an attention-based fusion network to classify texts in a resource-constrained language. AFuNet undergoes a comprehensive four-phase experimental process, including baseline model evaluation and hyperparameter tuning, late fusion and model selection, attention-based early fusion and model identification, and an ablation study with impact analysis. Fine-tuned based on five Bengali text classification corpora, AFuNet achieves impressive accuracies: 96.60 ± 0.2 (BTCC11), 85.37 ± 0.2 (OSBC), 97.35 ± 0.2 (BARD), 93.74 ± 0.2 (IndicNLP), and 96.51 ± 0.2 (ProthomAlo). In comparison with previous state-of-the-art models on these corpora, AFuNet demonstrates significant accuracy improvements ranging from 0.54% to 4.49%, showcasing its effectiveness in advancing text classification capabilities for the Bengali language.
KW - Early fusion
KW - Low-resource languages
KW - Multi-head attention
KW - Natural language processing
KW - Text classification
KW - Transformer-based learning
UR - http://www.scopus.com/inward/record.url?scp=85217165684&partnerID=8YFLogxK
U2 - 10.1007/s00521-024-10953-1
DO - 10.1007/s00521-024-10953-1
M3 - Journal Article
AN - SCOPUS:85217165684
SN - 0941-0643
VL - 37
SP - 6725
EP - 6748
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 9
M1 - 110182
ER -