Jalal, M.A., Milner, R., Hain, T. orcid.org/0000-0003-0939-3464 et al. (1 more author) (2020) Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition. In: Interspeech 2020. Interspeech 2020, 25-29 Oct 2020, Shanghai, China. ISCA - International Speech Communication Association , pp. 4084-4088.
Abstract
Speech emotion recognition is essential for obtaining emotional intelligence which affects the understanding of context and meaning of speech. The fundamental challenges of speech emotion recognition from a machine learning standpoint is to extract patterns which carry maximum correlation with the emotion information encoded in this signal, and to be as insensitive as possible to other types of information carried by speech. In this paper, a novel recurrent residual temporal context modelling framework is proposed. The framework includes mixture of multi-view attention smoothing and high dimensional feature projection for context expansion and learning feature representations. The framework is designed to be robust to changes in speaker and other distortions, and it provides state-of-the-art results for speech emotion recognition. Performance of the proposed approach is compared with a wide range of current architectures in a standard 4-class classification task on the widely used IEMOCAP corpus. A significant improvement of 4% unweighted accuracy over state-of-the-art systems is observed. Additionally, the attention vectors have been aligned with the input segments and plotted at two different attention levels to demonstrate the effectiveness.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2020 ISCA. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | speech emotion recognition; attention networks; computational paralinguistics |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 02 Mar 2021 13:40 |
Last Modified: | 03 Mar 2021 09:36 |
Status: | Published |
Publisher: | ISCA - International Speech Communication Association |
Refereed: | Yes |
Identification Number: | 10.21437/interspeech.2020-3005 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:171417 |