Feature selection and multimodal fusion for estimating emotions evoked by movie clips

Yasemin Timar, Nihan Karslioglu, Heysem Kaya, Albert Ali Salah

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Perceptual understanding of media content has many applications, including content-based retrieval, marketing, content optimization, psychological assessment, and affect-based learning. In this paper, we model audio visual features extracted from videos via machine learning approaches to estimate the affective responses of the viewers. We use the LIRIS-ACCEDE dataset and the MediaEval 2017 Challenge setting to evaluate the proposed methods. This dataset is composed of movies of professional or amateur origin, annotated with viewers' arousal, valence, and fear scores. We extract a number of audio features, such as Mel-frequency Cepstral Coefficients, and visual features, such as dense SIFT, hue-saturation histogram, and features from a deep neural network trained for object recognition. We contrast two different approaches in the paper, and report experiments with different fusion and smoothing strategies. We demonstrate the benefit of feature selection and multimodal fusion on estimating affective responses to movie segments.

Original languageEnglish
Title of host publicationICMR 2018 - Proceedings of the 2018 ACM International Conference on Multimedia Retrieval
PublisherAssociation for Computing Machinery
Pages405-412
Number of pages8
ISBN (Print)9781450350464
DOIs
Publication statusPublished - 5 Jun 2018
Event8th ACM International Conference on Multimedia Retrieval, ICMR 2018 - Yokohama, Japan
Duration: 11 Jun 201814 Jun 2018

Conference

Conference8th ACM International Conference on Multimedia Retrieval, ICMR 2018
Country/TerritoryJapan
CityYokohama
Period11/06/1814/06/18

Keywords

  • Affective computing
  • Audio-visual features
  • Emotion estimation
  • Face analysis
  • Movie analysis
  • Multimodal interaction

Fingerprint

Dive into the research topics of 'Feature selection and multimodal fusion for estimating emotions evoked by movie clips'. Together they form a unique fingerprint.

Cite this