White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Identification of diverse database subsets using property-based and fragment-based molecular descriptions

Ashton, M., Barnard, J., Casset, F., Charlton, M., Downs, G., Gorse, D., Holliday, J.D., Lahana, R. and Willett, P. (2003) Identification of diverse database subsets using property-based and fragment-based molecular descriptions. Quantitative Structure-Activity Relationships, 21 (6). pp. 598-604.

[img] Text

Download (185Kb)


This paper reports a comparison of calculated molecular properties and of 2D fragment bit-strings when used for the selection of structurally diverse subsets of a file of 44295 compounds. MaxMin dissimilarity-based selection and k-means cluster-based selection are used to select subsets containing between 1% and 20% of the file. Investigation of the numbers of bioactive molecules in the selected subsets suggest: that the MaxMin subsets are noticeably superior to the k-means subsets; that the property-based descriptors are marginally superior to the fragment-based descriptors; and that both approaches are noticeably superior to random selection.

Item Type: Article
Copyright, Publisher and Additional Information: © 2003 Wiley. This is an author produced version of a paper published in Quantitative Structure-Activity Relationships. Uploaded in accordance with the publisher's self-archiving policy.
Keywords: diversity, molecular diversity analysis, structural diversity, subset selection
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
The University of Sheffield > University of Sheffield Research Centres and Institutes > The Krebs Institute for Biomolecular Research (Sheffield)
Depositing User: Sherpa Assistant
Date Deposited: 11 Jan 2008 15:58
Last Modified: 08 Feb 2013 16:55
Published Version: http://dx.doi.org/10.1002/qsar.200290002
Status: Published
Publisher: Wiley
Refereed: Yes
Identification Number: 10.1002/qsar.200290002
URI: http://eprints.whiterose.ac.uk/id/eprint/3570

Actions (repository staff only: login required)