McInnes, B and Stevenson, R.M (2013) Determining the Difficulty of Word Sense Disambiguation. Journal of Biomedical Informatics, 47. pp. 83-90. ISSN 1532-0464
Abstract
Automatic processing of biomedical documents is made difficult by the fact that many of the terms they contain are ambiguous. Word Sense Disambiguation (WSD) systems attempt to resolve these ambiguities and identify the correct meaning. However, the published literature on WSD systems for biomedical documents report considerable differences in performance for different terms. The development of WSD systems is often expensive with respect to acquiring the necessary training data. It would therefore be useful to be able to predict in advance which terms WSD systems are likely to perform well or badly on. This paper explores various methods for estimating the performance of WSD systems on a wide range of ambiguous biomedical terms (including ambiguous words/phrases and abbreviations). The methods include both supervised and unsupervised approaches. The supervised approaches make use of information from labeled training data while the unsupervised ones rely on the UMLS Metathesaurus. The approaches are evaluated by comparing their predictions about how difficult disambiguation will be for ambiguous terms against the output of two WSD systems. We find the supervised methods are the best predictors of WSD difficulty, but are limited by their dependence on labeled training data. The unsupervised methods all perform well in some situations and can be applied more widely.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2013 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Keywords: | Natural Language Processing, NLP, Word Sense Disambiguation, WSD, Ambiguity, Biomedical documents |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number EPSRC EP/J008427/1 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 09 May 2014 13:07 |
Last Modified: | 11 Apr 2017 03:33 |
Published Version: | http://dx.doi.org/10.1016/j.jbi.2013.09.009 |
Status: | Published |
Publisher: | Elsevier |
Refereed: | Yes |
Identification Number: | 10.1016/j.jbi.2013.09.009 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:76883 |