In recent years, memes have become a common medium of promulgating offensive views by the content polluters in social media. Due to their multimodal nature, memes can easily evade the content regulators’ eyes. The proliferation of these undesired or harmful memes can cause a detrimental impact on social harmony. Therefore, restraining offensive memes on social media is of utmost importance. However, analyzing memes is very complicated as they implicitly express human emotions. Previous studies have not explored the joint modelling of multimodal features and their counteractive unimodal features (i.e., image, text) to classify undesired memes. This paper presents a framework that utilizes the weighted ensemble technique to assign weights to the participating visual, textual and multimodal models. The state-of-the-art visual (i.e., VGG19, VGG16, ResNet50) and textual (i.e., multilingual-BERT, multilingual-DistilBERT, XLM-R) models are employed to make the constituent modules of the framework. Moreover, two fusion approaches (i.e., early fusion and late fusion) are used to combine the visual and textual features for developing the multimodal models. The evaluations have demonstrated that the proposed weighted ensemble technique improves the performance over the investigated unimodal, multimodal, and ensemble models. The result shows that the proposed approach achieves superior outcomes on two multilingual benchmark datasets (MultiOFF and TamilMemes), with 66.73% and 58.59% weighted f1-scores, respectively. Furthermore, the comparative analysis reveals that the proposed approach outdoes other existing works by improving approximately 13% and 2% weighted f1-score gain.
|Number of pages
|Journal of King Saud University - Computer and Information Sciences
|Published - Oct. 2022
- Multilingual offense detection
- Multimodal data
- Multimodal fusion
- Multimodal learning