Elliott, D, Atwell, ES and Hartley, AF (2004) Compiling and using a shareable parallel corpus for MT evaluation. In: Proceedings of the 4th International Conference on Language Resources and Evaluation. 4th International Conference on Language Resources and Evaluation, 24-30 May 2004, Centro Cultural de Belem, Lisbon, Portugal. European Language Resources Association , 18 - 21.
Abstract
TECMATE is a dynamic TEchnical Corpus for MAchine Translation Evaluation currently being compiled and used at the University of Leeds. A purpose-built corpus for machine translation (MT) evaluation differs in terms of size and content from corpora used for other kinds of linguistic analysis. For example, our research in automated MT evaluation requires source texts with human and machine translations as well as the scores for these translations given by human judges. These scores will allow us to test the reliability of experimental automated evaluation methods. Furthermore, a representative sample of machine translations annotated with fluency errors is also required to guide our research into automated error detection. In this paper, we summarise our rationale for corpus design and describe the different stages of corpus development. We provide an example of the content for one language pair and present findings from our recent evaluations of MT output using texts from the French-English sub-corpus. TECMATE will shortly be available online for research.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Keywords: | evaluation; machine translation; parallel corpora |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures & Societies (Leeds) > Translation Studies (Leeds) The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 19 Jan 2015 11:00 |
Last Modified: | 19 Dec 2022 13:30 |
Published Version: | http://www.lrec-conf.org/lrec2004/ |
Status: | Published |
Publisher: | European Language Resources Association |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:82304 |