White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Analysis and use of fragment-occurrence data in similarity-based virtual screening

Arif, S., Holliday, J.D. and Willett, P. (2009) Analysis and use of fragment-occurrence data in similarity-based virtual screening. Journal of Computer Aided Molecular Design, 23 (9). pp. 655-668. ISSN 0920-654X


Download (403Kb)


Current systems for similarity-based virtual screening use similarity measures in which all the fragments in a fingerprint contribute equally to the calculation of structural similarity. This paper discusses the weighting of fragments on the basis of their frequencies of occurrence in molecules. Extensive experiments with sets of active molecules from the MDL Drug Data Report and the World of Molecular Bioactivity databases, using fingerprints encoding Tripos holograms, Pipeline Pilot ECFC_4 circular substructures and Sunset Molecular keys, demonstrate clearly that frequency-based screening is generally more effective than conventional, unweighted screening. The results suggest that standardising the raw occurrence frequencies by taking the square root of the frequencies will maximise the effectiveness of virtual screening. An upper-bound analysis shows the complex interactions that can take place between representations, weighting schemes and similarity coefficients when similarity measures are computed, and provides a rationalisation of the relative performance of the various weighting schemes.

Item Type: Article
Keywords: Fingerprint; Fragment occurrences; Ligand-based virtual screening; Similarity searching; Substructural fragment; Tanimoto coefficient; Virtual screening; Weighting scheme
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User: Information Studies
Date Deposited: 27 Aug 2009 11:17
Last Modified: 08 Feb 2013 17:39
Published Version: http://dx.doi.org/10.1007/s10822-009-9285-0
Status: Published
Publisher: Springer Verlag
Identification Number: 10.1007/s10822-009-9285-0
URI: http://eprints.whiterose.ac.uk/id/eprint/9258

Actions (repository staff only: login required)