SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Whittaker, S., Hirschberg, J., Amento, B. et al. (5 more authors) (2002) SCANMail: a voicemail interface that makes speech browsable, readable and searchable. In: Terveen, L., Wixon, D., Comstock, E. and Sasse, A., (eds.) Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves. Conference on Human Factors in Computing Systems, 20-25 Apr 2002, Minneapolis, Minnesota, USA. ACM Press , New York , pp. 275-282. ISBN 1-58113-453-3

Abstract

Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail’s support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly. Keywords: Voicemail, speech access, What You See Is Almost What You Hear, asynchronous communication, “speech as data”, empirical evaluation.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Whittaker, S. (s.whittaker@sheffield.ac.uk) Hirschberg, J. Amento, B. Stark, L. Bacchiani, M. Isenhour, P. Stead, L. Rosenberg, A.
Editors:	Terveen, L. Wixon, D. Comstock, E. Sasse, A.
Dates:	2002
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User:	Information Studies
Date Deposited:	25 Mar 2009 11:43
Last Modified:	19 Dec 2022 13:21
Published Version:	http://dx.doi.org/10.1145/503376.503426
Status:	Published
Publisher:	ACM Press
Refereed:	Yes
Identification Number:	10.1145/503376.503426
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:8395

CORE (COnnecting REpositories)

SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Abstract

Metadata

Download not available

Export

Statistics