Gully, Amelia Jane orcid.org/0000-0002-8600-121X, Yoshimura, Takenori, Murphy, Damian Thomas orcid.org/0000-0002-6676-9459 et al. (3 more authors) (2017) Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network. In: Interspeech 2017. INTERSPEECH . ISCA-INST SPEECH COMMUNICATION ASSOC , pp. 234-238.
Abstract
Following recent advances in direct modeling of the speech waveform using a deep neural network, we propose a novel method that directly estimates a physical model of the vocal tract from the speech waveform, rather than magnetic resonance imaging data. This provides a clear relationship between the model and the size and shape of the vocal tract, offering considerable flexibility in terms of speech characteristics such as age and gender. Initial tests indicate that despite a highly simplified physical model, intelligible synthesized speech is obtained. This illustrates the potential of the combined technique for the control of physical models in general, and hence the generation of more natural-sounding synthetic speech.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2017 ISCA. Uploaded in accordance with the publisher’s self-archiving policy. Further copying may not be permitted; contact the publisher for details |
Keywords: | speech synthesis,digital waveguide mesh ,deep neural network |
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Electronic Engineering (York) |
Depositing User: | Pure (York) |
Date Deposited: | 19 Dec 2018 16:30 |
Last Modified: | 05 Jan 2025 00:45 |
Published Version: | https://doi.org/10.21437/Interspeech.2017-900 |
Status: | Published |
Publisher: | ISCA-INST SPEECH COMMUNICATION ASSOC |
Series Name: | INTERSPEECH |
Identification Number: | 10.21437/Interspeech.2017-900 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:140245 |