Empirical interpretation of speech emotion perception with attention based model for speech emotion Recognition

Jalal, M.A., Milner, R. orcid.org/0000-0001-8924-0593 and Hain, T. orcid.org/0000-0003-0939-3464 (2020) Empirical interpretation of speech emotion perception with attention based model for speech emotion Recognition. In: Proceedings of Interspeech 2020. Interspeech 2020, 25-29 Oct 2020, Shanghai, China (Online). . International Speech Communication Association (ISCA), pp. 4113-4117. ISSN: 1990-9772.

Abstract

Speech emotion recognition is essential for obtaining emotional intelligence which affects the understanding of context and meaning of speech. Harmonically structured vowel and consonant sounds add indexical and linguistic cues in spoken information. Previous research argued whether vowel sound cues were more important in carrying the emotional context from a psychological and linguistic point of view. Other research also claimed that emotion information could exist in small overlapping acoustic cues. However, these claims are not corroborated in computational speech emotion recognition systems. In this research, a convolution-based model and a long-short-term memory-based model, both using attention, are applied to investigate these theories of speech emotion on computational models. The role of acoustic context and word importance is demonstrated for the task of speech emotion recognition. The IEMOCAP corpus is evaluated by the proposed models, and 80.1% unweighted accuracy is achieved on pure acoustic data which is higher than current state-of-the-art models on this task. The phones and words are mapped to the attention vectors and it is seen that the vowel sounds are more important for defining emotion acoustic cues than the consonants, and the model can assign word importance based on acoustic context.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Jalal, M.A. Milner, R. https://orcid.org/0000-0001-8924-0593 Hain, T. https://orcid.org/0000-0003-0939-3464
Copyright, Publisher and Additional Information:	© 2020 ISCA. Reproduced in accordance with the publisher's self-archiving policy.
Dates:	Published (online): 25 October 2020 Published: 25 October 2020
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Date Deposited:	25 Mar 2022 16:57
Last Modified:	29 Mar 2022 09:39
Status:	Published
Publisher:	International Speech Communication Association (ISCA)
Refereed:	Yes
Identification Number:	10.21437/interspeech.2020-3007
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:185082

CORE (COnnecting REpositories)

Empirical interpretation of speech emotion perception with attention based model for speech emotion Recognition

Abstract

Metadata

Download

Published Version

Export

Statistics