Adapting pretrained models for adult to child voice conversion

Nomo Sudro, P., Ragni, A. and Hain, T. orcid.org/0000-0003-0939-3464 (2023) Adapting pretrained models for adult to child voice conversion. In: 2023 31st European Signal Processing Conference (EUSIPCO) Proceedings. 2023 31st European Signal Processing Conference (EUSIPCO), 04-08 Sep 2023, Helsinki, Finland. . Institute of Electrical and Electronics Engineers (IEEE), pp. 271-275. ISBN: 9789464593600. ISSN: 2219-5491. EISSN: 2076-1465.

Abstract

Due to widespread lack of parallel data for adult to child voice conversion (VC), non parallel VC techniques have grown in popularity. Methods, such as encoder-decoder model, have achieved good performance in adult-to-adult VC. It provides flexibility by either training each module separately or exploit pretrained models. These pretrained models are only available for adult speech. In case of children speech, we do not have enough data to train all the modules of a robust encoder-decoder based VC system. In a limited data scenario, we can only train the decoder module using target speech. Specifically, we find that adult to child VC using a pretrained encoder and trained decoder with child speech does not yield spectral variability of a child speech. The reason being gross spectral mismatch between adult and child speech. We address this mismatch by exploiting a warping mechanism to modify the acoustic attributes based on child speech. We conduct objective and subjective evaluations on CMU and CSLU kids corpus and one adult actress data. Results show that the proposed method reduces MCD and F0 RMSE by 0.67 and 0.03 respectively. For subjective evaluations we observe a relative MOS improvement of 10.7% for naturalness and 18.23% for similarity.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Nomo Sudro, P. Ragni, A. Hain, T. https://orcid.org/0000-0003-0939-3464
Copyright, Publisher and Additional Information:	© 2023 The Authors. Except as otherwise noted, this author-accepted version of a paper published in 2023 31st European Signal Processing Conference (EUSIPCO) Proceedings is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
Keywords:	Child speech; adult speech; voice conversion; encoder-decoder model
Dates:	Accepted: 28 September 2023 Published (online): 1 November 2023 Published: 1 November 2023
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Date Deposited:	06 Oct 2023 09:07
Last Modified:	13 Nov 2023 16:20
Status:	Published
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Refereed:	Yes
Identification Number:	10.23919/EUSIPCO58844.2023.10289993
Related URLs:	Conference Author
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:203759

Download

Accepted Version

Filename: eusipco_final_version.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

[thumbnail of eusipco_final_version.pdf]

CORE (COnnecting REpositories)

Adapting pretrained models for adult to child voice conversion

Abstract

Metadata

Download

Accepted Version

Export

Statistics