Ashton, M., Barnard, J., Casset, F., Charlton, M., Downs, G., Gorse, D., Holliday, J.D., Lahana, R. and Willett, P. (2003) Identification of diverse database subsets using property-based and fragment-based molecular descriptions. Quantitative Structure-Activity Relationships, 21 (6). pp. 598-604.
This paper reports a comparison of calculated molecular properties and of 2D fragment bit-strings when used for the selection of structurally diverse subsets of a file of 44295 compounds. MaxMin dissimilarity-based selection and k-means cluster-based selection are used to select subsets containing between 1% and 20% of the file. Investigation of the numbers of bioactive molecules in the selected subsets suggest: that the MaxMin subsets are noticeably superior to the k-means subsets; that the property-based descriptors are marginally superior to the fragment-based descriptors; and that both approaches are noticeably superior to random selection.
|Copyright, Publisher and Additional Information:||© 2003 Wiley. This is an author produced version of a paper published in Quantitative Structure-Activity Relationships. Uploaded in accordance with the publisher's self-archiving policy.|
|Keywords:||diversity, molecular diversity analysis, structural diversity, subset selection|
|Institution:||The University of Sheffield|
|Academic Units:||The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
The University of Sheffield > University of Sheffield Research Centres and Institutes > The Krebs Institute for Biomolecular Research (Sheffield)
|Depositing User:||Sherpa Assistant|
|Date Deposited:||11 Jan 2008 15:58|
|Last Modified:||08 Feb 2013 16:55|