Ng, W.M., Kwan, A.C.M., Lee, T. et al. (1 more author) (2017) ShefCE: A Cantonese-English Bilingual Speech Corpus for Pronunciation Assessment. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, 05/03/2017-09/03/2017, New Orleans, USA. Institute of Electrical and Electronics Engineers ISBN 978-1-5090-4117-6
Abstract
This paper introduces the development of ShefCE: a Cantonese-English bilingual speech corpus from L2 English speakers in Hong Kong. Bilingual parallel recording materials were chosen from TED online lectures. Script selection were carried out according to bilingual consistency (evaluated using a machine translation system) and the distribution balance of phonemes. 31 undergraduate to postgraduate students in Hong Kong aged 20-30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Baseline phoneme/syllable recognition systems were trained on background data with and without the ShefCE training data. The final syllable error rate (SER) for Cantonese is 17.3% and final phoneme error rate (PER) for English is 34.5%. The automatic speech recognition performance on English showed a significant mismatch when applying L1 models on L2 data, suggesting the need for explicit accent adaptation. ShefCE and the corresponding baseline models will be made openly available for academic research.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2017 IEEE. This is an author produced version of a paper subsequently published in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Bilingual parallel speech corpus; Cantonese; English pronunciation assessment |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 18 May 2017 15:32 |
Last Modified: | 14 Jul 2020 09:43 |
Published Version: | https://doi.org/10.1109/ICASSP.2017.7953273 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers |
Refereed: | Yes |
Identification Number: | 10.1109/ICASSP.2017.7953273 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:116619 |