White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

A fast algorithm for selecting sets of dissimilar molecules from large chemical databases

Holliday, J.D., Ranade, S.S. and Willett, P. (1995) A fast algorithm for selecting sets of dissimilar molecules from large chemical databases. Quantitative Structure-Activity Relationships, 14 (6). pp. 501-506. ISSN 1611-020X

Full text not available from this repository.


Current algorithms for the selection of a set of n dissimilar molecules from a dataset of N molecules have an expected time complexity of O(n2N). This paper describes an improved algorithm that has an expected time complexity of O(nN) and that will identify exactly the same set of molecules as the normal algorithm if the cosine coefficient is used for the calculation of the inter-molecular (dis)similarities. The algorithm is applicable to any type of representation that characterises a molecule by a set of attribute values and to any procedure that involves calculating a sum of inter-molecular similarities. It is also both more effective and more efficient than our implementation of a genetic algorithm for the selection of maximally-dissimilar sets of molecules.

Item Type: Article
Keywords: Algorithmic complexity; Compound selection; Dissimilarity selection; Random screening; Similarity coefficient
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User: Information Studies
Date Deposited: 26 Aug 2009 11:04
Last Modified: 26 Aug 2009 11:04
Published Version: http://www3.interscience.wiley.com/journal/1133236...
Status: Published
Publisher: Wiley
URI: http://eprints.whiterose.ac.uk/id/eprint/9237

Actions (repository staff only: login required)