Casanueva, I., Hain, T. orcid.org/0000-0003-0939-3464 and Green, P. (2016) Improving generalisation to new speakers in spoken dialogue state tracking. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Interspeech 2016, 08-12 Sep 2016, San Francisco, USA. , pp. 2726-2730.
Abstract
Users with disabilities can greatly benefit from personalised voice-enabled environmental-control interfaces, but for users with speech impairments (e.g. dysarthria) poor ASR performance poses a challenge to successful dialogue. Statistical dialogue management has shown resilience against high ASR error rates, hence making it useful to improve the performance of these interfaces. However, little research was devoted to dialogue management personalisation to specific users so far. Recently, data driven discriminative models have been shown to yield the best performance in dialogue state tracking (the inference of the user goal from the dialogue history). However, due to the unique characteristics of each speaker, training a system for a new user when user specific data is not available can be challenging due to the mismatch between training and working conditions. This work investigates two methods to improve the performance with new speakers of a LSTM-based personalised state tracker: The use of speaker specific acoustic and ASRrelated features; and dropout regularisation. It is shown that in an environmental control system for dysarthric speakers, the combination of both techniques yields improvements of 3.5% absolute in state tracking accuracy. Further analysis explores the effect of using different amounts of speaker specific data to train the tracking system.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2016 ISCA. This is an author produced version of a paper subsequently published in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | dialogue state tracking; dysarthric speakers |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 14 Dec 2016 15:43 |
Last Modified: | 19 Dec 2022 13:35 |
Published Version: | http://doi.org/10.21437/Interspeech.2016-404 |
Status: | Published |
Refereed: | Yes |
Identification Number: | 10.21437/Interspeech.2016-404 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:109280 |