Multimodal Emotion Recognition System Leveraging Decision Fusion with Acoustic and Visual Cues

Md Tanvir Rahman, Shawly Ahsan, Jawad Hossain, Mohammed Moshiul Hoque, M. Ali Akber Dewan

Research output: Chapter in Book/Report/Conference proceedingPublished Conference contributionpeer-review

Abstract

Multimodal emotion recognition (MER) involves detecting and understanding human emotions by analyzing multiple modalities, such as images, audio, videos, and texts. MER is a challenging problem due to the complexities of multiple modalities and fusing their information to interpret and classify human emotions accurately. This paper introduces an intelligent framework (MEmoR) for multimodal emotion recognition leveraging audio-visual fusion. It focuses on the challenging domain of emotion detection within a Bengali audio-visual dataset. A vital aspect of this work involves creating a new dataset, a multimodal emotion recognition dataset (MERD), tailored to specific task requirements. The MERD encompasses 1937 annotated multimodal data across four categories: happy, sad, angry, and neutral. The proposed framework utilizes various machine learning (ML), deep learning (DL), and transformer-based models for audio and visual modalities. This work explores and integrates audio and visual modalities through feature-level and decision-level fusion. .

Original languageEnglish
Title of host publicationPattern Recognition. ICPR 2024 International Workshops and Challenges, Proceedings
EditorsShivakumara Palaiahnakote, Stephanie Schuckers, Jean-Marc Ogier, Prabir Bhattacharya, Umapada Pal, Saumik Bhattacharya
Pages117-133
Number of pages17
DOIs
Publication statusPublished - 2025
Event27th International Conference on Pattern Recognition Workshops, ICPRW 2024 - Kolkata, India
Duration: 1 Dec. 20241 Dec. 2024

Publication series

NameLecture Notes in Computer Science
Volume15617 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Pattern Recognition Workshops, ICPRW 2024
Country/TerritoryIndia
CityKolkata
Period1/12/241/12/24

Keywords

  • Acoustic Features
  • Decision Fusion
  • Multimodal Emotion Recognition
  • Natural Language Processing
  • Visual Features

Fingerprint

Dive into the research topics of 'Multimodal Emotion Recognition System Leveraging Decision Fusion with Acoustic and Visual Cues'. Together they form a unique fingerprint.

Cite this