VisualSpeech: Enhancing Prosody Modeling in TTS Using Video

This is the latest version of this eprint.

Que, S. and Ragni, A. orcid.org/0000-0003-0634-4456 (2025) VisualSpeech: Enhancing Prosody Modeling in TTS Using Video. In: Scharenborg, O., Oertel, C. and Truong, K., (eds.) Proceedings of Interspeech 2025. Interspeech 2025, 17-21 Aug 2025, Rotterdam, The Netherlands. International Speech Communication Association (ISCA), pp. 3778-3782. ISSN: 2958-1796. EISSN: 2958-1796.

Abstract

Metadata

Item Type: Proceedings Paper
Authors/Creators:
Editors:
  • Scharenborg, O.
  • Oertel, C.
  • Truong, K.
Copyright, Publisher and Additional Information:

© 2025 The Authors. Except as otherwise noted, this author-accepted version of a paper published in Proceedings of Interspeech 2025 is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/

© 2025 ISCA. Reproduced in accordance with the publisher's self-archiving policy.

Keywords: Text-to-speech Synthesis; Video; Visual Features; Prosody
Dates:
  • Published (online): 17 August 2025
  • Published: 17 August 2025
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User: Symplectic Sheffield
Date Deposited: 24 Sep 2025 14:08
Last Modified: 24 Sep 2025 14:08
Published Version: https://www.isca-archive.org/interspeech_2025/que2...
Status: Published
Publisher: International Speech Communication Association (ISCA)
Refereed: Yes
Identification Number: 10.21437/Interspeech.2025-1494
Related URLs:
Open Archives Initiative ID (OAI ID):

Available Versions of this Item

Export

Statistics