Abstract
This paper presents our work on ACM MM Audio Visual Emotion Corpus 2013 (AVEC 2013) depression recognition sub-challenge using the baseline features in accordance with the challenge protocol. We use Canonical Correlation Analysis for audio-visual fusion as well as covariate extraction for the target task. The video baseline provides histograms of local phase quantization features extracted from 4×4=16 regions of the detected face. We summarize the video features over segments of length 20 seconds using mode and range functionals. We observe that features of range functional that measure the variance tendency provides statistically significantly higher canonical correlation than mode functional features that measure the mean tendency. Moreover, when audio-visual features are used with varying number of covariates per region, the regions that were consistently found the best are the ones corresponding to two eyes and the right part of the mouth. We reach 9.44 Root Mean Square Error on the challenge test set using audio-visual decision fusion, improving the video baseline 30% relative.
| Original language | English |
|---|---|
| Title of host publication | MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia |
| Publisher | Association for Computing Machinery |
| Pages | 961-964 |
| Number of pages | 4 |
| ISBN (Electronic) | 9781450330633 |
| DOIs | |
| Publication status | Published - 1 Jan 2014 |
| Event | 2014 ACM Conference on Multimedia, MM 2014 - Orlando, United States Duration: 3 Nov 2014 → 7 Nov 2014 |
Conference
| Conference | 2014 ACM Conference on Multimedia, MM 2014 |
|---|---|
| Country/Territory | United States |
| City | Orlando |
| Period | 3/11/14 → 7/11/14 |
Keywords
- Audio-visual emotion corpus
- Audio-visual fusion
- Canonical Correlation Analysis
- Depression recogni
- Feature extraction