TY - GEN
T1 - Multimodal Emotion Recognition System Leveraging Decision Fusion with Acoustic and Visual Cues
AU - Rahman, Md Tanvir
AU - Ahsan, Shawly
AU - Hossain, Jawad
AU - Hoque, Mohammed Moshiul
AU - Dewan, M. Ali Akber
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Multimodal emotion recognition (MER) involves detecting and understanding human emotions by analyzing multiple modalities, such as images, audio, videos, and texts. MER is a challenging problem due to the complexities of multiple modalities and fusing their information to interpret and classify human emotions accurately. This paper introduces an intelligent framework (MEmoR) for multimodal emotion recognition leveraging audio-visual fusion. It focuses on the challenging domain of emotion detection within a Bengali audio-visual dataset. A vital aspect of this work involves creating a new dataset, a multimodal emotion recognition dataset (MERD), tailored to specific task requirements. The MERD encompasses 1937 annotated multimodal data across four categories: happy, sad, angry, and neutral. The proposed framework utilizes various machine learning (ML), deep learning (DL), and transformer-based models for audio and visual modalities. This work explores and integrates audio and visual modalities through feature-level and decision-level fusion. .
AB - Multimodal emotion recognition (MER) involves detecting and understanding human emotions by analyzing multiple modalities, such as images, audio, videos, and texts. MER is a challenging problem due to the complexities of multiple modalities and fusing their information to interpret and classify human emotions accurately. This paper introduces an intelligent framework (MEmoR) for multimodal emotion recognition leveraging audio-visual fusion. It focuses on the challenging domain of emotion detection within a Bengali audio-visual dataset. A vital aspect of this work involves creating a new dataset, a multimodal emotion recognition dataset (MERD), tailored to specific task requirements. The MERD encompasses 1937 annotated multimodal data across four categories: happy, sad, angry, and neutral. The proposed framework utilizes various machine learning (ML), deep learning (DL), and transformer-based models for audio and visual modalities. This work explores and integrates audio and visual modalities through feature-level and decision-level fusion. .
KW - Acoustic Features
KW - Decision Fusion
KW - Multimodal Emotion Recognition
KW - Natural Language Processing
KW - Visual Features
UR - http://www.scopus.com/inward/record.url?scp=105007222789&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-88217-3_9
DO - 10.1007/978-3-031-88217-3_9
M3 - Published Conference contribution
AN - SCOPUS:105007222789
SN - 9783031882166
T3 - Lecture Notes in Computer Science
SP - 117
EP - 133
BT - Pattern Recognition. ICPR 2024 International Workshops and Challenges, Proceedings
A2 - Palaiahnakote, Shivakumara
A2 - Schuckers, Stephanie
A2 - Ogier, Jean-Marc
A2 - Bhattacharya, Prabir
A2 - Pal, Umapada
A2 - Bhattacharya, Saumik
T2 - 27th International Conference on Pattern Recognition Workshops, ICPRW 2024
Y2 - 1 December 2024 through 1 December 2024
ER -