Combining modality-specific extreme learning machines for emotion recognition in the wild

Heysem Kaya*, Albert Ali Salah

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

This paper proposes extreme learning machines (ELM) for modeling audio and video features for emotion recognition under uncontrolled conditions. The ELM paradigm is a fast and accurate learning alternative for single layer Feedforward networks. We experiment on the acted facial expressions in the wild corpus, which features seven discrete emotions, and adhere to the EmotiW 2014 challenge protocols. In our study, better results for both modalities are obtained with kernel ELM compared to basic ELM. We contrast several fusion approaches and reach a test set accuracy of 50.12 % (over a video-only baseline of 33.70 %) on the seven-class (i.e. six basic emotions plus neutral) EmotiW 2014 Challenge, by combining one audio and three video sub-systems. We also compare ELM with partial least squares regression based classification that is used in the top performing system of EmotiW 2014, and discuss the advantages of both approaches.

Original languageEnglish
Pages (from-to)139-149
Number of pages11
JournalJournal on Multimodal User Interfaces
Volume10
Issue number2
DOIs
Publication statusPublished - 1 Jun 2016
Externally publishedYes

Keywords

  • Audio-visual emotion corpus
  • Audio-visual fusion
  • Emotion recognition in the wild
  • Extreme learning machines
  • Feature extraction

Fingerprint

Dive into the research topics of 'Combining modality-specific extreme learning machines for emotion recognition in the wild'. Together they form a unique fingerprint.

Cite this