Identification of diverse database subsets using property-based and fragment-based molecular descriptions

Abstract

This paper reports a comparison of calculated molecular properties and of 2D fragment bit-strings when used for the selection of structurally diverse subsets of a file of 44295 compounds. MaxMin dissimilarity-based selection and k-means cluster-based selection are used to select subsets containing between 1% and 20% of the file. Investigation of the numbers of bioactive molecules in the selected subsets suggest: that the MaxMin subsets are noticeably superior to the k-means subsets; that the property-based descriptors are marginally superior to the fragment-based descriptors; and that both approaches are noticeably superior to random selection.

Metadata

Item Type:	Article
Authors/Creators:	Ashton, M. Barnard, J. Casset, F. Charlton, M. Downs, G. Gorse, D. Holliday, J.D. (j.d.holliday@sheffield.ac.uk) Lahana, R. Willett, P. (p.willett@sheffield.ac.uk)
Copyright, Publisher and Additional Information:	© 2003 Wiley. This is an author produced version of a paper published in Quantitative Structure-Activity Relationships. Uploaded in accordance with the publisher's self-archiving policy.
Keywords:	diversity, molecular diversity analysis, structural diversity, subset selection
Dates:	Published: January 2003
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) The University of Sheffield > University of Sheffield Research Centres and Institutes > The Krebs Institute for Biomolecular Research (Sheffield)
Date Deposited:	11 Jan 2008 15:58
Last Modified:	08 Feb 2013 16:55
Published Version:	http://dx.doi.org/10.1002/qsar.200290002
Status:	Published
Publisher:	Wiley
Refereed:	Yes
Identification Number:	10.1002/qsar.200290002
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:3570

CORE (COnnecting REpositories)

Identification of diverse database subsets using property-based and fragment-based molecular descriptions

Abstract

Metadata

Download

willettp5

Export

Statistics