Saeed, A., Nawab, R.M.A., Stevenson, R.M. orcid.org/0000-0002-9483-6006 et al. (1 more author) (2019) A word sense disambiguation corpus for Urdu. Language Resources and Evaluation, 53 (3). pp. 397-418. ISSN 1574-020X
Abstract
The aim of word sense disambiguation (WSD) is to correctly identify the meaning of a word in context. All natural languages exhibit word sense ambiguities and these are often hard to resolve automatically. Consequently WSD is considered an important problem in natural language processing (NLP). Standard evaluation resources are needed to develop, evaluate and compare WSD methods. A range of initiatives have lead to the development of benchmark WSD corpora for a wide range of languages from various language families. However, there is a lack of benchmark WSD corpora for South Asian languages including Urdu, despite there being over 300 million Urdu speakers and a large amounts of Urdu digital text available online. To address that gap, this study describes a novel benchmark corpus for the Urdu Lexical Sample WSD task. This corpus contains 50 target words (30 nouns, 11 adjectives, and 9 verbs). A standard, manually crafted dictionary called Urdu Lughat is used as a sense inventory. Four baseline WSD approaches were applied to the corpus. The results show that the best performance was obtained using a simple Bag of Words approach. To encourage NLP research on the Urdu language the corpus is freely available to the research community.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © Springer Nature B.V. 2018. This is an author produced version of a paper subsequently published in Language Resources and Evaluation. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Word sense disambiguation; Lexical sample task; Sense tagged Urdu corpus |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 29 Nov 2018 13:42 |
Last Modified: | 02 Nov 2021 14:11 |
Status: | Published |
Publisher: | Springer Verlag |
Refereed: | Yes |
Identification Number: | 10.1007/s10579-018-9438-7 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:139329 |