Zanon Boito, M., Anastasopoulos, A., Lekakou, M. et al. (2 more authors) (2018) A small Griko-Italian speech translation corpus. In: The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 29-31 Aug 2018, Gurugram, India. International Speech Communication Association (ISCA)
Abstract
This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research. The corpus consists of 330 utterances (about 2 hours of speech) which have been transcribed and translated in Italian, with annotations for word-level speech-to-transcription and speech-to-translation alignments. The corpus also includes morpho syntactic tags and word-level glosses. Applying an automatic unit discovery method, pseudo-phones were also generated. We detail how the corpus was collected, cleaned and processed, and we illustrate its use on zero-resource tasks by presenting some baseline results for the task of speech-to-translation alignment and unsupervised word discovery. The dataset will be available online, aiming to encourage replicability and diversity in computational language documentation experiments.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2018 ISCA. Reproduced in accordance with the publisher's self-archiving policy. |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
| Depositing User: | Symplectic Sheffield |
| Date Deposited: | 25 Nov 2019 11:20 |
| Last Modified: | 25 Nov 2019 11:20 |
| Published Version: | https://www.isca-speech.org/archive/SLTU_2018/abst... |
| Status: | Published online |
| Publisher: | International Speech Communication Association (ISCA) |
| Refereed: | Yes |
| Identification Number: | 10.21437/sltu.2018-8 |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:153559 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)