“Wild West” of Evaluating Speech-Driven 3D Facial Animation Synthesis: A Benchmark Study

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Recent advancements in the field of audio-driven 3D facial animation have accelerated rapidly, with numerous papers being published in a short span of time. This surge in research has garnered significant attention from both academia and industry with its potential applications on digital humans. Various approaches, both deterministic and non-deterministic, have been explored based on foundational advancements in deep learning algorithms. However, there remains no consensus among researchers on standardized methods for evaluating these techniques. Additionally, rather than converging on a common set of datasets and objective metrics suited for specific methods, recent works exhibit considerable variation in experimental setups. This inconsistency complicates the research landscape, making it difficult to establish a streamlined evaluation process and rendering many cross-paper comparisons challenging. Moreover, the common practice of A/B testing in perceptual studies focus only on two common metrics and not sufficient for non-deterministic and emotion-enabled approaches. The lack of correlations between subjective and objective metrics points out that there is a need for critical analysis in this space. In this study, we address these issues by benchmarking state-of-the-art deterministic and non-deterministic models, utilizing a consistent experimental setup across a carefully curated set of objective metrics and datasets. We also conduct a perceptual user study to assess whether subjective perceptual metrics align with the objective metrics. Our findings indicate that model rankings do not necessarily generalize across datasets, and subjective metric ratings are not always consistent with their corresponding objective metrics. The supplementary video, edited code scripts for training on different datasets and documentation related to this benchmark study are made publicly available- https://galib360.github.io/face-benchmark-project/.

Original languageEnglish
Article numbere70073
JournalComputer Graphics Forum
DOIs
Publication statusE-pub ahead of print - 18 Apr 2025

Bibliographical note

Publisher Copyright:
© 2025 The Author(s). Computer Graphics Forum published by Eurographics - The European Association for Computer Graphics and John Wiley & Sons Ltd.

Keywords

  • Animation
  • CCS Concepts
  • • Computing methodologies → Neural networks
  • • Human-centered computing → User studies

Fingerprint

Dive into the research topics of '“Wild West” of Evaluating Speech-Driven 3D Facial Animation Synthesis: A Benchmark Study'. Together they form a unique fingerprint.

Cite this