This is the latest version of this eprint.
Clark, R.D., Shepphird, J.K. and Holliday, J.D. (2009) The effect of structural redundancy in validation sets on virtual screening performance. Journal of Chemometrics, 23 (9-10). 471 - 478. ISSN 0886-9383
Abstract
The performance of a classification model is often assessed in terms of how well it separates a set of known observations into appropriate classes. If the validation sets used for such analyses are redundant due to bias in sampling, the relevance of the conclusions drawn to prospective work in which new kinds of positives are sought may be compromised. In the case of the various virtual screening techniques used in modern drug discovery, such bias generally appears as over-representation of particular structural subclasses in the test set. We show how clustering by substructural similarity, followed by applying arithmetic and harmonic weighting schemes to receiver operating characteristic (ROC) curves, can be used to identify validation sets that are biased due to such redundancies. This can be accomplished qualitatively by direct examination or quantitatively by comparing the areas under the respective linear or semilog curves (AUCs or pAUCs).
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2009 John Wiley & Sons, Ltd. This is an author produced version of a paper subsequently published in Journal of Chemometrics. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Docking; Circular fingerprints; Clustering; ROC; Validation |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Information Studies |
Date Deposited: | 03 Mar 2014 16:23 |
Last Modified: | 03 Mar 2014 16:23 |
Published Version: | http://dx.doi.org/10.1002/cem.1240 |
Status: | Published |
Publisher: | John Wiley & Sons |
Refereed: | Yes |
Identification Number: | 10.1002/cem.1240 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:77943 |
Available Versions of this Item
-
The effect of structural redundancy in validation sets on virtual screening performance. (deposited 27 Aug 2009 11:03)
- The effect of structural redundancy in validation sets on virtual screening performance. (deposited 03 Mar 2014 16:23) [Currently Displayed]