Holliday, J.D., Ranade, S.S. and Willett, P.
(1995)
*A fast algorithm for selecting sets of dissimilar molecules from large chemical databases.*
Quantitative Structure-Activity Relationships, 14 (6).
pp. 501-506.
ISSN 1611-020X

## Abstract

Current algorithms for the selection of a set of n dissimilar molecules from a dataset of N molecules have an expected time complexity of O(n2N). This paper describes an improved algorithm that has an expected time complexity of O(nN) and that will identify exactly the same set of molecules as the normal algorithm if the cosine coefficient is used for the calculation of the inter-molecular (dis)similarities. The algorithm is applicable to any type of representation that characterises a molecule by a set of attribute values and to any procedure that involves calculating a sum of inter-molecular similarities. It is also both more effective and more efficient than our implementation of a genetic algorithm for the selection of maximally-dissimilar sets of molecules.

Item Type: | Article |
---|---|

Keywords: | Algorithmic complexity; Compound selection; Dissimilarity selection; Random screening; Similarity coefficient |

Institution: | The University of Sheffield |

Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |

Depositing User: | Information Studies |

Date Deposited: | 26 Aug 2009 11:04 |

Last Modified: | 26 Aug 2009 11:04 |

Published Version: | http://www3.interscience.wiley.com/journal/1133236... |

Status: | Published |

Publisher: | Wiley |

URI: | http://eprints.whiterose.ac.uk/id/eprint/9237 |