TY - JOUR
T1 - AuthorNet
T2 - Leveraging attention-based early fusion of transformers for low-resource authorship attribution
AU - Hossain, Md Rajib
AU - Hoque, Mohammed Moshiul
AU - Dewan, M. Ali Akber
AU - Hoque, Enamul
AU - Siddique, Nazmul
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2025/3/1
Y1 - 2025/3/1
N2 - Authorship Attribution (AA) is crucial for identifying the author of a given text from a pool of suspects, especially with the widespread use of the internet and electronic devices. However, most AA research has primarily focused on high-resource languages like English, leaving low-resource languages such as Bengali relatively unexplored. Challenges faced in this domain include the absence of benchmark corpora, a lack of context-aware feature extractors, limited availability of tuned hyperparameters, and OOV issues. To address these challenges, this study introduces AuthorNet for authorship attribution using attention-based early fusion of transformer-based language models, i.e., concatenation of an embeddings output of two existing models that were fine-tuned. AuthorNet consists of three key modules: Feature extraction, Fine-tuning and selection of best-performing models, and Attention-based early fusion. To evaluate the performance of AuthorNet, a number of experiments using four benchmark corpora have been conducted. The results demonstrated exceptional accuracy: 98.86 ± 0.01%, 99.49 ± 0.01%, 97.91 ± 0.01%, and 99.87 ± 0.01% for four corpora. Notably, AuthorNet outperformed all foundation models, achieving accuracy improvements ranging from 0.24% to 2.92% across the four corpora.
AB - Authorship Attribution (AA) is crucial for identifying the author of a given text from a pool of suspects, especially with the widespread use of the internet and electronic devices. However, most AA research has primarily focused on high-resource languages like English, leaving low-resource languages such as Bengali relatively unexplored. Challenges faced in this domain include the absence of benchmark corpora, a lack of context-aware feature extractors, limited availability of tuned hyperparameters, and OOV issues. To address these challenges, this study introduces AuthorNet for authorship attribution using attention-based early fusion of transformer-based language models, i.e., concatenation of an embeddings output of two existing models that were fine-tuned. AuthorNet consists of three key modules: Feature extraction, Fine-tuning and selection of best-performing models, and Attention-based early fusion. To evaluate the performance of AuthorNet, a number of experiments using four benchmark corpora have been conducted. The results demonstrated exceptional accuracy: 98.86 ± 0.01%, 99.49 ± 0.01%, 97.91 ± 0.01%, and 99.87 ± 0.01% for four corpora. Notably, AuthorNet outperformed all foundation models, achieving accuracy improvements ranging from 0.24% to 2.92% across the four corpora.
KW - Authorship attribution
KW - Early fusion
KW - Fine-tuning
KW - Hyperparameters tuning
KW - Low resource language
KW - Multi-head attention
KW - Natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85208266121&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2024.125643
DO - 10.1016/j.eswa.2024.125643
M3 - Journal Article
AN - SCOPUS:85208266121
SN - 0957-4174
VL - 262
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 125643
ER -