TY - JOUR
T1 - Interpretable speech features vs. DNN embeddings
T2 - What to use in the automatic assessment of Parkinson's disease in multi-lingual scenarios
AU - Favaro, Anna
AU - Tsai, Yi Ting
AU - Butala, Ankur
AU - Thebaud, Thomas
AU - Villalba, Jesús
AU - Dehak, Najim
AU - Moro-Velázquez, Laureano
N1 - Publisher Copyright:
© 2023 The Author(s)
PY - 2023/11
Y1 - 2023/11
N2 - Speech-based approaches for assessing Parkinson's Disease (PD) often rely on feature extraction for automatic classification or detection. While many studies prioritize accuracy by using non-interpretable embeddings from Deep Neural Networks, this work aims to explore the predictive capabilities and language robustness of both feature types in a systematic fashion. As interpretable features, prosodic, linguistic, and cognitive descriptors were adopted, while x-vectors, Wav2Vec 2.0, HuBERT, and TRILLsson representations were used as non-interpretable features. Mono-lingual, multi-lingual, and cross-lingual machine learning experiments were conducted leveraging six data sets comprising speech recordings from various languages: American English, Castilian Spanish, Colombian Spanish, Italian, German, and Czech. For interpretable feature-based models, the mean of the best F1-scores obtained from each language was 81% in mono-lingual, 81% in multi-lingual, and 71% in cross-lingual experiments. For non-interpretable feature-based models, instead, they were 85% in mono-lingual, 88% in multi-lingual, and 79% in cross-lingual experiments. Firstly, models based on non-interpretable features outperformed interpretable ones, especially in cross-lingual experiments. Specifically, TRILLsson provided the most stable and accurate results across tasks and data sets. Conversely, the two types of features adopted showed some level of language robustness in multi-lingual and cross-lingual experiments. Overall, these results suggest that interpretable feature-based models can be used by clinicians to evaluate the deterioration of the speech of patients with PD, while non-interpretable feature-based models can be leveraged to achieve higher detection accuracy.
AB - Speech-based approaches for assessing Parkinson's Disease (PD) often rely on feature extraction for automatic classification or detection. While many studies prioritize accuracy by using non-interpretable embeddings from Deep Neural Networks, this work aims to explore the predictive capabilities and language robustness of both feature types in a systematic fashion. As interpretable features, prosodic, linguistic, and cognitive descriptors were adopted, while x-vectors, Wav2Vec 2.0, HuBERT, and TRILLsson representations were used as non-interpretable features. Mono-lingual, multi-lingual, and cross-lingual machine learning experiments were conducted leveraging six data sets comprising speech recordings from various languages: American English, Castilian Spanish, Colombian Spanish, Italian, German, and Czech. For interpretable feature-based models, the mean of the best F1-scores obtained from each language was 81% in mono-lingual, 81% in multi-lingual, and 71% in cross-lingual experiments. For non-interpretable feature-based models, instead, they were 85% in mono-lingual, 88% in multi-lingual, and 79% in cross-lingual experiments. Firstly, models based on non-interpretable features outperformed interpretable ones, especially in cross-lingual experiments. Specifically, TRILLsson provided the most stable and accurate results across tasks and data sets. Conversely, the two types of features adopted showed some level of language robustness in multi-lingual and cross-lingual experiments. Overall, these results suggest that interpretable feature-based models can be used by clinicians to evaluate the deterioration of the speech of patients with PD, while non-interpretable feature-based models can be leveraged to achieve higher detection accuracy.
KW - Deep learning
KW - Interpretable features
KW - Machine learning
KW - Parkinson's disease
KW - Speech
UR - http://www.scopus.com/inward/record.url?scp=85174002209&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174002209&partnerID=8YFLogxK
U2 - 10.1016/j.compbiomed.2023.107559
DO - 10.1016/j.compbiomed.2023.107559
M3 - Article
C2 - 37852107
AN - SCOPUS:85174002209
SN - 0010-4825
VL - 166
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 107559
ER -