VisualSpeech: enhance prosody with visual context in TTS.

This is a preprint and may not have undergone formal peer review

Que, S. and Ragni, A. orcid.org/0000-0003-0634-4456 (Submitted: 2025) VisualSpeech: enhance prosody with visual context in TTS. [Preprint - arXiv] (Submitted)

Abstract

Metadata

Item Type: Preprint
Authors/Creators:
Copyright, Publisher and Additional Information:

© 2025 The Author(s). This preprint is made available under a Creative Commons Attribution 4.0 International License. (https://creativecommons.org/licenses/by/4.0/)

Keywords: Text-to-speech Synthesis; Visual Features; Prosody
Dates:
  • Submitted: 31 January 2025
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User: Symplectic Sheffield
Date Deposited: 21 May 2025 12:48
Last Modified: 21 May 2025 12:48
Status: Submitted
Identification Number: 10.48550/arXiv.2501.19258
Related URLs:
Open Archives Initiative ID (OAI ID):

Download

Export

Statistics