Multimodal fusion of audio, scene, and face features for first impression estimation

Furkan Gürpinar, Heysem Kaya, Albert Ali Salah

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Affective computing, particularly emotion and personality trait recognition, is of increasing interest in many research disciplines. The interplay of emotion and personality shows itself in the first impression left on other people. Moreover, the ambient information, e.g. the environment and objects surrounding the subject, also affect these impressions. In this work, we employ pre-trained Deep Convolutional Neural Networks to extract facial emotion and ambient information from images for predicting apparent personality. We also investigate Local Gabor Binary Patterns from Three Orthogonal Planes video descriptor and acoustic features extracted via the popularly used openSMILE tool. We subsequently propose classifying features using a Kernel Extreme Learning Machine and fusing their predictions. The proposed system is applied to the ChaLearn Challenge on First Impression Recognition, achieving the winning test set accuracy of 0.913, averaged over the 'Big Five' personality traits.

Original languageEnglish
Title of host publication2016 23rd International Conference on Pattern Recognition, ICPR 2016
PublisherIEEE
Pages43-48
Number of pages6
Volume0
ISBN (Electronic)9781509048472
DOIs
Publication statusPublished - 1 Jan 2016
Externally publishedYes
Event23rd International Conference on Pattern Recognition, ICPR 2016 - Cancun, Mexico
Duration: 4 Dec 20168 Dec 2016

Conference

Conference23rd International Conference on Pattern Recognition, ICPR 2016
Country/TerritoryMexico
CityCancun
Period4/12/168/12/16

Fingerprint

Dive into the research topics of 'Multimodal fusion of audio, scene, and face features for first impression estimation'. Together they form a unique fingerprint.

Cite this