Whittaker, S., Hirschberg, J., Amento, B., Stark, L., Bacchiani, M., Isenhour, P., Stead, L. and Rosenberg, A. (2002) SCANMail: a voicemail interface that makes speech browsable, readable and searchable. In: Terveen, L., Wixon, D., Comstock, E. and Sasse, A., (eds.) Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves. Conference on Human Factors in Computing Systems, 20-25 April, 2002, Minneapolis, Minnesota, USA. ACM Press , New York , pp. 275-282. ISBN 1-58113-453-3
Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail’s support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly. Keywords: Voicemail, speech access, What You See Is Almost What You Hear, asynchronous communication, “speech as data”, empirical evaluation.
|Institution:||The University of Sheffield|
|Academic Units:||The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)|
|Depositing User:||Information Studies|
|Date Deposited:||25 Mar 2009 11:43|
|Last Modified:||19 May 2009 17:02|