Clark, R.D., Shepphird, J.K. and Holliday, J.D. (2009) The effect of structural redundancy in validation sets on virtual screening performance. Journal of Chemometrics. ISSN 0886-9383
The performance of a classification model is often assessed in terms of how well it separates a set of known observations into appropriate classes. If the validation sets used for such analyses are redundant due to bias in sampling, the relevance of the conclusions drawn to prospective work in which new kinds of positives are sought may be compromised. In the case of the various virtual screening techniques used in modern drug discovery, such bias generally appears as over-representation of particular structural subclasses in the test set. We show how clustering by substructural similarity, followed by applying arithmetic and harmonic weighting schemes to receiver operating characteristic (ROC) curves, can be used to identify validation sets that are biased due to such redundancies. This can be accomplished qualitatively by direct examination or quantitatively by comparing the areas under the respective linear or semilog curves (AUCs or pAUCs). Copyright © 2009 John Wiley & Sons, Ltd.
|Keywords:||docking; circular fingerprints; clustering; ROC; validation|
|Institution:||The University of Sheffield|
|Academic Units:||The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)|
|Depositing User:||Information Studies|
|Date Deposited:||27 Aug 2009 11:03|
|Last Modified:||27 Aug 2009 13:11|