TY - CONF
T1 - Automatic pitch accent classification through image classification
AU - Hu, Na
AU - Schnack, Hugo
AU - Arvaniti, Amalia
PY - 2024
Y1 - 2024
N2 - The classification of pitch accents has posed significant challenges in automatic intonation labelling. Previous research primarily adopted feature-based approaches, predicting pitch accents using a finite set of features including acoustic features (F0, duration, intensity) and lexical features. In this study, we explored a novel approach, classifying pitch accents as images represented in pixels. To evaluate this method’s effectiveness, we used a relatively simple classification task involving only two types of pitch accents (H* and L+H*). The training of a basic neural network model for classifying images of these two types of accents (N= 2,025) yielded an average accuracy of 93.5% across 10 runs on the test set, showcasing the potential effectiveness of this new approach.
AB - The classification of pitch accents has posed significant challenges in automatic intonation labelling. Previous research primarily adopted feature-based approaches, predicting pitch accents using a finite set of features including acoustic features (F0, duration, intensity) and lexical features. In this study, we explored a novel approach, classifying pitch accents as images represented in pixels. To evaluate this method’s effectiveness, we used a relatively simple classification task involving only two types of pitch accents (H* and L+H*). The training of a basic neural network model for classifying images of these two types of accents (N= 2,025) yielded an average accuracy of 93.5% across 10 runs on the test set, showcasing the potential effectiveness of this new approach.
U2 - 10.21437/Interspeech.2024-895
DO - 10.21437/Interspeech.2024-895
M3 - Paper
SP - 2050
EP - 2054
ER -