Shah, K., Cohn, T. and Specia, L. orcid.org/0000-0002-5495-3128 (2015) A Bayesian non-linear method for feature selection in machine translation quality estimation. Machine Translation, 29 (2). pp. 101-125. ISSN 0922-6567
Abstract
We perform a systematic analysis of the effectiveness of features for the problem of predicting the quality of machine translation (MT) at the sentence level. Starting from a comprehensive feature set, we apply a technique based on Gaussian processes, a Bayesian non-linear learning method, to automatically identify features leading to accurate model performance. We consider application to several datasets across different language pairs and text domains, with translations produced by various MT systems and scored for quality according to different evaluation criteria. We show that selecting features with this technique leads to significantly better performance in most datasets, as compared to using the complete feature sets or a state-of-the-art feature selection approach. In addition, we identify a small set of features which seem to perform well across most datasets.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © Springer Science+Business Media Dordrecht 2015. This is an author produced version of a paper subsequently published in Machine Translation. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Machine translation; Quality estimation; Gaussian Processes |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number EUROPEAN COMMISSION - FP6/FP7 QTLaunchPad - 296347 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 13 Apr 2016 13:11 |
Last Modified: | 22 Apr 2016 17:32 |
Published Version: | http://dx.doi.org/10.1007/s10590-014-9164-x |
Status: | Published |
Publisher: | Springer Verlag |
Refereed: | Yes |
Identification Number: | 10.1007/s10590-014-9164-x |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:98286 |