Abstract
Authorship Attribution (AA) is crucial for identifying the author of a given text from a pool of suspects, especially with the widespread use of the internet and electronic devices. However, most AA research has primarily focused on high-resource languages like English, leaving low-resource languages such as Bengali relatively unexplored. Challenges faced in this domain include the absence of benchmark corpora, a lack of context-aware feature extractors, limited availability of tuned hyperparameters, and OOV issues. To address these challenges, this study introduces AuthorNet for authorship attribution using attention-based early fusion of transformer-based language models, i.e., concatenation of an embeddings output of two existing models that were fine-tuned. AuthorNet consists of three key modules: Feature extraction, Fine-tuning and selection of best-performing models, and Attention-based early fusion. To evaluate the performance of AuthorNet, a number of experiments using four benchmark corpora have been conducted. The results demonstrated exceptional accuracy: 98.86 ± 0.01%, 99.49 ± 0.01%, 97.91 ± 0.01%, and 99.87 ± 0.01% for four corpora. Notably, AuthorNet outperformed all foundation models, achieving accuracy improvements ranging from 0.24% to 2.92% across the four corpora.
| Original language | English |
|---|---|
| Article number | 125643 |
| Journal | Expert Systems with Applications |
| Volume | 262 |
| DOIs | |
| Publication status | Published - 1 Mar. 2025 |
Keywords
- Authorship attribution
- Early fusion
- Fine-tuning
- Hyperparameters tuning
- Low resource language
- Multi-head attention
- Natural language processing
Fingerprint
Dive into the research topics of 'AuthorNet: Leveraging attention-based early fusion of transformers for low-resource authorship attribution'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver