Duque, A., Stevenson, R.M. orcid.org/0000-0002-9483-6006, Martinez-Romo, J. et al. (1 more author) (2018) Co-occurrence graphs for word sense disambiguation in the biomedical domain. Artificial Intelligence in Medicine, 87. pp. 9-19. ISSN 0933-3657
Abstract
Word Sense Disambiguation is a key step for many Natural Language Processing tasks (e.g. summarization, text classification, relation extraction) and presents a challenge to any system that aims to process documents from the biomedical domain. In this paper, we present a new graphbased unsupervised technique to address this problem. The knowledge base used in this work is a graph built with co-occurrence information from medical concepts found in scientific abstracts, and hence adapted to the specific domain. Unlike other unsupervised approaches based on static graphs such as UMLS, in this work the knowledge base takes the context of the ambiguous terms into account. Abstracts downloaded from PubMed are used for building the graph and disambiguation is performed using the Personalized PageRank algorithm. Evaluation is carried out over two test datasets widely explored in the literature. Different parameters of the system are also evaluated to test robustness and scalability. Results show that the system is able to outperform state-of-the-art knowledge-based systems, obtaining more than 10% of accuracy improvement in some cases, while only requiring minimal external resources.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2018 Elsevier B.V. This is an author produced version of a paper subsequently published in Artificial Intelligence in Medicine. Uploaded in accordance with the publisher's self-archiving policy. Article available under the terms of the CC-BY-NC-ND licence (https://creativecommons.org/licenses/by-nc-nd/4.0/) |
Keywords: | Word Sense Disambiguation; Graph-Based Systems; Unsupervised Machine Learning; Unified Medical Language System; Natural Language Processing; Information Extraction |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 15 Mar 2018 14:42 |
Last Modified: | 11 Nov 2020 09:10 |
Status: | Published |
Publisher: | Elsevier |
Refereed: | Yes |
Identification Number: | 10.1016/j.artmed.2018.03.002 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:128572 |