Fomicheva, M., Sun, S., Fonseca, E.R. et al. (6 more authors) (Submitted: 2020) MLQE-PE : a multilingual quality estimation and post-editing dataset. arXiv. (Submitted)
Abstract
We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains seven language pairs, with human labels for 9,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2020 The Author(s). For reuse permissions, please contact the Author(s). |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number European Commission - Horizon 2020 825303 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 02 Nov 2020 14:28 |
Last Modified: | 02 Nov 2020 14:28 |
Published Version: | https://arxiv.org/abs/2010.04480 |
Status: | Submitted |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:167469 |