Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection

Swapnil Bhosale, Abhra Chaudhuri, Alex Lee Robert Williams, Divyank Tiwari, Anjan Dutta, Xiatian Zhu, Pushpak Bhattacharyya, Diptesh Kanojia

October 2023

Preprint PDF Code Dataset

Abstract

The introduction of the MUStARD dataset, and its emotion recognition extension MUStARD++, have identified sarcasm to be a multi-modal phenomenon – expressed not only in natural language text, but also through manners of speech (like tonality and intonation) and visual cues (facial expression). With this work, we aim to perform a rigorous benchmarking of the MUStARD++ dataset by considering state-of-the-art language, speech, and visual encoders, for fully utilizing the totality of the multi-modal richness that it has to offer, achieving a 2% improvement in macro-F1 over the existing benchmark. Additionally, to cure the imbalance in the ‘sarcasm type’ category in MUStARD++, we propose an extension, which we call \emph{MUStARD++ Balanced}, benchmarking the same with instances from the extension split across both train and test sets, achieving a further 2.4% macro-F1 boost. The new clips were taken from a novel source – the TV show, House MD, which adds to the diversity of the dataset, and were manually annotated by multiple annotators with substantial inter-annotator agreement in terms of Cohen’s kappa and Krippendorf’s alpha. Our code, extended data, and SOTA benchmark models are made public.

Type

Preprint

Publication

arXiv preprint arXiv:2310.01430