Semantic speech editing

Abstract

Editing speech data is currently time-consuming and error-prone. Speech editors rely on acoustic waveform representations, which force users to repeatedly sample the underlying speech to identify words and phrases to edit. Instead we developed a semantic editor that reduces the need for extensive sampling by providing access to meaning. The editor shows a time-aligned errorful transcript produced by applying automatic speech recognition (ASR) to the original speech. Users visually scan the words in the transcript to identify important phrases. They then edit the transcript directly using standard word processing 'cut and paste' operations, which extract the corresponding time-aligned speech. ASR errors mean that users must supplement what they read in the transcript by accessing the original speech. Even when there are transcript errors, however, the semantic representation still provides users with enough information to target what they edit and play, reducing the need for extensive sampling. A laboratory evaluation showed that semantic editing is more efficient than acoustic editing even when ASR is highly inaccurate.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Whittaker, S. (s.whittaker@sheffield.ac.uk) Amento, B.
Editors:	Dykstra-Erickson, E. Tscheligi, M.
Dates:	2004
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User:	Information Studies
Date Deposited:	25 Mar 2009 10:19
Last Modified:	19 Dec 2022 13:22
Published Version:	http://doi.acm.org/10.1145/985692.985759
Status:	Published
Publisher:	New York: ACM Press
Refereed:	Yes
Identification Number:	10.1145/985692.985759
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:8418

CORE (COnnecting REpositories)

Semantic speech editing

Abstract

Metadata

Download not available

Export

Statistics