A Semi-Real-Time Method for Social Robots to Detect and Locate Overlapping Speech Events

Yue Li, Koen V. Hindriks, Florian Kunneman

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

It is useful for a social robot to detect and locate users based on their speech. Notable challenges hampering the effective localization of a speaker are background noise and overlapping speech. Convolutional Neural Networks (CNNs) have shown to yield good performance on locating single speakers on a curated dataset, but to a lesser extent in scenarios with two speakers. In addition, their computational cost is still too high for a timely reaction in real-world settings. We build on the current state-of-the-art CNN approach, and propose several improvements for distinguishing multiple speakers by time-alignment in the input representation and reducing computational costs by considerably shortening the input audio blocks. We evaluate this approach on an existing dataset with blocks of noisy and overlapping speech recorded in rooms of different sizes, predicting the number of active speech events and their azimuth locations. The results show that our approach outperforms other approaches in locating two speakers and is considerably faster than the best-performing alternative approach. The time-domain information in the input representation was found essential for predicting the location of the signal source.
Original languageEnglish
Title of host publication2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)
PublisherIEEE
Pages2086-2093
Number of pages8
ISBN (Print)979-8-3503-3671-9
DOIs
Publication statusPublished - 31 Aug 2023
Externally publishedYes
Event2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) - Busan, Korea, Republic of
Duration: 28 Aug 202331 Aug 2023

Conference

Conference2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)
Period28/08/2331/08/23

Keywords

  • Location awareness
  • Event detection
  • Social robots
  • Stacking
  • Neural networks
  • Computational efficiency
  • Convolutional neural networks

Fingerprint

Dive into the research topics of 'A Semi-Real-Time Method for Social Robots to Detect and Locate Overlapping Speech Events'. Together they form a unique fingerprint.

Cite this