Alvi, F., Stevenson, R.M. orcid.org/0000-0002-9483-6006 and Clough, P. (2017) Plagiarism Detection in Texts Obfuscated with Homoglyphs. In: Jose, J.M., Hauff, C., Altıngovde, I.S., Song, D., Albakour, D., Watt, S. and Tait, J., (eds.) ECIR 2017: Advances in Information Retrieval. 39th European Conference on Information Retrieval, 08-13 Apr 2017, Aberdeen, Scotland . Lecture Notes in Computer Science . Springer, Cham , pp. 669-675. ISBN 978-3-319-56608-5
Abstract
Homoglyphs can be used for disguising plagiarized text by replacing letters in source texts with visually identical letters from other scripts. Most current plagiarism detection systems are not able to detect plagiarism when text has been obfuscated using homoglyphs. In this work, we present two alternative approaches for detecting plagiarism in homoglyph obfuscated texts. The first approach utilizes the Unicode list of confusables to replace homoglyphs with visually identical letters, while the second approach uses a similarity score computed using normalized hamming distance to match homoglyph obfuscated words with source words. Empirical testing on datasets from PAN-2015 shows that both approaches perform equally well for plagiarism detection in homoglyph obfuscated texts.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © Springer International Publishing AG 2017. This is an author produced version of a paper subsequently published in Lecture Notes in Computer Science. Uploaded in accordance with the publisher's self-archiving policy. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 23 Feb 2017 16:14 |
Last Modified: | 30 Jun 2017 15:23 |
Published Version: | https://doi.org/10.1007/978-3-319-56608-5_64 |
Status: | Published |
Publisher: | Springer, Cham |
Series Name: | Lecture Notes in Computer Science |
Refereed: | Yes |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:112665 |