Nawab, R., Stevenson, M. and Clough, P. (2010) University of Sheffield: Lab Report for PAN at CLEF 2010. In: CLEF 2010 LABs and Workshops, Notebook Papers. 4th International Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse, 22-23 Sep 2010, Padua, Italy. CLEF
Abstract
This paper describes the University of Sheffield entry for the 2nd international plagiarism detection competition (PAN 2010). Our system attempts to identify extrinsic plagiarism. A three-stage approach is used: pre-processing, candidate document selection (using word n-grams) and detailed analysis (using the Running Karp-Rabin Greedy String Tiling string matching algorithm). This approach achieved an overall performance of 0.20 in the official evaluation with a precision of 0.40, recall of 0.16 and granularity of 1.21.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2010 CLEF. This is an author produced version of a paper subsequently published in CLEF 2010 LABs and Workshops, Notebook Papers. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Extrinsic plagiarism detection; Greedy string tiling; GST; N-grams |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 22 Apr 2014 10:51 |
Last Modified: | 23 Jun 2023 21:39 |
Published Version: | http://www.clef-initiative.eu/documents/71612/8637... |
Status: | Published |
Publisher: | CLEF |
Refereed: | Yes |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:78590 |