Aker, A., Paramita, M.L., Pinnis, M. et al. (1 more author) (2014) Bilingual dictionaries for all EU languages. In: LREC 2014 Proceedings. LREC 2014, 26-31 May 2014, Reykjavik, Iceland. European Language Resources Association , pp. 2839-2845. ISBN 978-2-9517408-8-4
Abstract
Bilingual dictionaries can be automatically generated using the GIZA++ tool. However, these dictionaries contain a lot of noise, because of which the qualities of outputs of tools relying on the dictionaries are negatively affected. In this work, we present three different methods for cleaning noise from automatically generated bilingual dictionaries: LLR, pivot and transliteration based approach. We have applied these approaches on the GIZA++ dictionaries – dictionaries covering official EU languages – in order to remove noise. Our evaluation showed that all methods help to reduce noise. However, the best performance is achieved using the transliteration based approach. We provide all bilingual dictionaries (the original GIZA++ dictionaries and the cleaned ones) free for download. We also provide the cleaning tools and scripts for free download.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2014 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial Licence (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. You may not use the material for commercial purposes. |
Keywords: | GIZA++ dictionaries; EU languages; dictionary cleaning |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number EUROPEAN COMMISSION - FP6/FP7 TAAS - 296312 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 26 Feb 2016 09:07 |
Last Modified: | 19 Mar 2016 08:24 |
Published Version: | http://www.lrec-conf.org/proceedings/lrec2014/pdf/... |
Status: | Published |
Publisher: | European Language Resources Association |
Refereed: | Yes |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:94340 |