Fisher vectors with cascaded normalization for paralinguistic analysis

Heysem Kaya, Alexey A. Karpov, Albert Ali Salah

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Computational Paralinguistics has several unresolved issues, one of which is coping with large variability due to speakers, spoken content and corpora. In this paper, we address the variability compensation issue by proposing a novel method composed of i) Fisher vector encoding of low level descriptors extracted from the signal, ii) speaker z-normalization applied after speaker clustering iii) non-linear normalization of features and iv) classification based on Kernel Extreme Learning Machines and Partial Least Squares regression. For experimental validation, we apply the proposed method on INTERSPEECH 2015 Computational Paralinguistics Challenge (ComParE 2015), Eating Condition sub-challenge, which is a seven-class classification task. In our preliminary experiments, the proposed method achieves an Unweighted Average Recall (UAR) score of 83.1%, outperforming the challenge test set baseline UAR (65.9%) by a large margin.

Original languageEnglish
Title of host publicationINTERSPEECH-2015
Pages909-913
Number of pages5
Volume2015-January
Publication statusPublished - 10 Sept 2015
Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
Duration: 6 Sept 201510 Sept 2015

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

Conference16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
Country/TerritoryGermany
CityDresden
Period6/09/1510/09/15

Keywords

  • ComParE
  • Computational paralinguistics
  • Eating condition
  • ELM
  • Fisher vector
  • PLS
  • Signal representation

Fingerprint

Dive into the research topics of 'Fisher vectors with cascaded normalization for paralinguistic analysis'. Together they form a unique fingerprint.

Cite this