Gaizauskas, R., Paramita, M.L., Barker, E. et al. (3 more authors) (2015) Extracting bilingual terms from the Web. Terminology, 21 (2). pp. 205-236. ISSN 0929-9971
Abstract
In this paper we make two contributions. First, we describe a multi-component system called BiTES (Bilingual Term Extraction System) designed to automatically gather domain-specific bilingual term pairs from Web data. BiTES components consist of data gathering tools, domain classifiers, monolingual text extraction systems and bilingual term aligners. BiTES is readily extendable to new language pairs and has been successfully used to gather bilingual terminology for 24 language pairs, including English and all official EU languages, save Irish. Second, we describe a novel set of methods for evaluating the main components of BiTES and present the results of our evaluation for six language pairs. Results show that the BiTES approach can be used to successfully harvest quality bilingual term pairs from the Web. Our evaluation method delivers significant insights about the strengths and weaknesses of our techniques. It can be straightforwardly reused to evaluate other bilingual term extraction systems and makes a novel contribution to the study of how to evaluate bilingual terminology extraction systems.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2015 John Benjamins Publishing. This is an author produced version of a paper subsequently published in Terminology. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | comparable corpora; domain classification; term extraction; cross-language term alignment; machine translation; evaluation of term extraction |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 18 Nov 2015 16:33 |
Last Modified: | 08 Mar 2016 16:37 |
Published Version: | http://dx.doi.org/10.1075/term.21.2.04gai |
Status: | Published |
Publisher: | John Benjamins Publishing |
Refereed: | Yes |
Identification Number: | 10.1075/term.21.2.04gai |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:91863 |