Abstract
Perceptual understanding of media content has many applications, including content-based retrieval, marketing, content optimization, psychological assessment, and affect-based learning. In this paper, we model audio visual features extracted from videos via machine learning approaches to estimate the affective responses of the viewers. We use the LIRIS-ACCEDE dataset and the MediaEval 2017 Challenge setting to evaluate the proposed methods. This dataset is composed of movies of professional or amateur origin, annotated with viewers' arousal, valence, and fear scores. We extract a number of audio features, such as Mel-frequency Cepstral Coefficients, and visual features, such as dense SIFT, hue-saturation histogram, and features from a deep neural network trained for object recognition. We contrast two different approaches in the paper, and report experiments with different fusion and smoothing strategies. We demonstrate the benefit of feature selection and multimodal fusion on estimating affective responses to movie segments.
Original language | English |
---|---|
Title of host publication | ICMR 2018 - Proceedings of the 2018 ACM International Conference on Multimedia Retrieval |
Publisher | Association for Computing Machinery |
Pages | 405-412 |
Number of pages | 8 |
ISBN (Print) | 9781450350464 |
DOIs | |
Publication status | Published - 5 Jun 2018 |
Event | 8th ACM International Conference on Multimedia Retrieval, ICMR 2018 - Yokohama, Japan Duration: 11 Jun 2018 → 14 Jun 2018 |
Conference
Conference | 8th ACM International Conference on Multimedia Retrieval, ICMR 2018 |
---|---|
Country/Territory | Japan |
City | Yokohama |
Period | 11/06/18 → 14/06/18 |
Keywords
- Affective computing
- Audio-visual features
- Emotion estimation
- Face analysis
- Movie analysis
- Multimodal interaction