Barker, E., Barker, J. orcid.org/0000-0002-1684-5660, Gaizauskas, R. et al. (2 more authors) (2022) SNuC: The Sheffield Numbers Spoken Language Corpus. In: Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Odijk, J. and Piperidis, S., (eds.) Proceedings of the Thirteenth Language Resources and Evaluation Conference. 13th Conference on Language Resources and Evaluation (LREC 2022), 20-25 Jun 2022, Marseille, France. European Language Resources Association , pp. 1978-1984. ISBN 9791095546726
Abstract
We present SNuC, the first published corpus of spoken alphanumeric identifiers of the sort typically used as serial and part numbers in the manufacturing sector. The dataset contains recordings and transcriptions of over 50 native British English speakers, speaking over 13,000 multi-character alphanumeric sequences and totalling almost 20 hours of recorded speech. We describe requirements taken into account in the designing the corpus and the methodology used to construct it. We present summary statistics describing the corpus contents, as well as a preliminary investigation into errors in spoken alphanumeric identifiers. We validate the corpus by showing how it can be used to adapt a deep learning neural network based ASR system, resulting in improved recognition accuracy on the task of spoken alphanumeric identifier recognition. Finally, we discuss further potential uses for the corpus and for the tools developed to construct it.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2022 European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0 (https://creativecommons.org/licenses/by-nc/4.0/). |
Keywords: | spoken language corpora; corpus creation methodology; spoken alphanumeric identifier recognition |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 13 Mar 2024 10:17 |
Last Modified: | 13 Mar 2024 10:17 |
Published Version: | https://aclanthology.org/volumes/2022.lrec-1/ |
Status: | Published |
Publisher: | European Language Resources Association |
Refereed: | Yes |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:210222 |