Paramita, M.L. orcid.org/0000-0002-9414-1853, Clough, P. and Gaizauskas, R. (2017) Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles. In: Jose, J.M., Hauff, C., Altıngovde, I.S., Song, D., Albakour, D., Watt, S. and Tait, J., (eds.) ECIR 2017: Advances in Information Retrieval. 39th European Conference on Information Retrieval (ECIR 2017), 08/04/2017 - 13/04/2017, Aberdeen, UK. Lecture Notes in Computer Science (10193). Springer, Cham , pp. 663-669. ISBN 978-3-319-56608-5
Abstract
Measuring the similarity of interlanguage-linked Wikipedia articles often requires the use of suitable language resources (e.g., dictionaries and MT systems) which can be problematic for languages with limited or poor translation resources. The size of Wikipedia can also present computational demands when computing similarity. This paper presents a ‘lightweight’ approach to measure cross-lingual similarity in Wikipedia using section headings rather than the entire Wikipedia article, and language resources derived from Wikipedia and Wiktionary to perform translation. Using an existing dataset we evaluate the approach for 7 language pairs. Results show that the performance using section headings is comparable to using all article content, dictionaries derived from Wikipedia and Wiktionary are sufficient to compute cross-lingual similarity and combinations of features can further improve results.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © Springer International Publishing AG 2017. This is an author produced version of a paper subsequently published in Lecture Notes in Computer Science. Uploaded in accordance with the publisher's self-archiving policy. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 17 Feb 2017 14:00 |
Last Modified: | 18 Jul 2017 16:34 |
Published Version: | https://doi.org/10.1007/978-3-319-56608-5_59 |
Status: | Published |
Publisher: | Springer, Cham |
Series Name: | Lecture Notes in Computer Science |
Refereed: | Yes |
Identification Number: | 10.1007/978-3-319-56608-5_59 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:111923 |