Abstract
It is useful for a social robot to detect and locate users based on their speech. Notable challenges hampering the effective localization of a speaker are background noise and overlapping speech. Convolutional Neural Networks (CNNs) have shown to yield good performance on locating single speakers on a curated dataset, but to a lesser extent in scenarios with two speakers. In addition, their computational cost is still too high for a timely reaction in real-world settings. We build on the current state-of-the-art CNN approach, and propose several improvements for distinguishing multiple speakers by time-alignment in the input representation and reducing computational costs by considerably shortening the input audio blocks. We evaluate this approach on an existing dataset with blocks of noisy and overlapping speech recorded in rooms of different sizes, predicting the number of active speech events and their azimuth locations. The results show that our approach outperforms other approaches in locating two speakers and is considerably faster than the best-performing alternative approach. The time-domain information in the input representation was found essential for predicting the location of the signal source.
Original language | English |
---|---|
Title of host publication | 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) |
Publisher | IEEE |
Pages | 2086-2093 |
Number of pages | 8 |
ISBN (Print) | 979-8-3503-3671-9 |
DOIs | |
Publication status | Published - 31 Aug 2023 |
Externally published | Yes |
Event | 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) - Busan, Korea, Republic of Duration: 28 Aug 2023 → 31 Aug 2023 |
Conference
Conference | 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) |
---|---|
Period | 28/08/23 → 31/08/23 |
Keywords
- Location awareness
- Event detection
- Social robots
- Stacking
- Neural networks
- Computational efficiency
- Convolutional neural networks