Nawab, R.M.A., Stevenson, M. and Clough, P. (2017) An IR-based Approach Utilising Query Expansion for Plagiarism Detection in MEDLINE. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14 (4). pp. 796-804. ISSN 1545-5963
Abstract
The identification of duplicated and plagiarised passages of text has become an increasingly active area of research. In this paper we investigate methods for plagiarism detection that aim to identify potential sources of plagiarism from MEDLINE, particularly when the original text has been modified through the replacement of words or phrases. A scalable approach based on Information Retrieval is used to perform candidate document selection - the identification of a subset of potential source documents given a suspicious text - from MEDLINE. Query expansion is performed using the ULMS Metathesaurus to deal with situations in which original documents are obfuscated. Various approaches to Word Sense Disambiguation are investigated to deal with cases where there are multiple Concept Unique Identifiers (CUIs) for a given term. Results using the proposed IR-based approach outperform a state-of-the-art baseline based on Kullback-Leibler Distance.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2016 IEEE. This is an author produced version of a paper subsequently published in IEEE/ACM Transactions on Computational Biology and Bioinformatics. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Natural Language Processing; Information Retrieval; Extrinsic Plagiarism Detection; MEDLINE; UMLS Metathesaurus; Query Expansion |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 18 Feb 2016 15:01 |
Last Modified: | 13 Nov 2017 12:14 |
Published Version: | http://dx.doi.org/10.1109/TCBB.2016.2542803 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.1109/TCBB.2016.2542803 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:94770 |