Chen, M., Shi, Y. and Hain, T. orcid.org/0000-0003-0939-3464 (2021) Towards low-resource StarGAN voice conversion using weight adaptive instance normalization. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 06-11 Jun 2021, Toronto, ON, Canada. Institute of Electrical and Electronics Engineers ISBN 9781728176062
Abstract
Many-to-many voice conversion with non-parallel training data has seen significant progress in recent years. It is challenging because of lacking of ground truth parallel data. StarGAN-based models have gained attentions because of their efficiency and effective. However, most of the StarGAN-based works only focused on small number of speakers and large amount of training data. In this work, we aim at improving the data efficiency of the model and achieving a many-to-many non-parallel StarGAN-based voice conversion for a relatively large number of speakers with limited training samples. In order to improve data efficiency, the proposed model uses a speaker encoder for extracting speaker embeddings and weight adaptive instance normalization (W-AdaIN) layers. Experiments are conducted with 109 speakers under two low-resource situations, where the number of training samples is 20 and 5 per speaker. An objective evaluation shows the proposed model outperforms baseline methods significantly. Furthermore, a subjective evaluation shows that, for both naturalness and similarity, the proposed model outperforms baseline method.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | Voice Conversion; Generative Adversarial Networks; Low-resource |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 17 Jun 2022 11:03 |
Last Modified: | 19 Jun 2022 01:55 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers |
Refereed: | Yes |
Identification Number: | 10.1109/icassp39728.2021.9415042 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:187593 |