Haranczyk, M. and Holliday, J.D. (2008) Comparison of similarity coefficients for clustering and compound selection. Journal of Chemical Information and Modeling, 48 (3). pp. 498-508. ISSN 1549-9596
Full text not available from this repository.
Published Version: http://dx.doi.org/10.1021/ci700413a
Abstract
Recent studies into the use of a selection of similarity coefficients, when applied to searches of chemical databases represented by binary fingerprints, have shown considerable variation in their retrieval performance and in the sets of compounds being retrieved. The main factor influencing performance is the density distribution of the bitstrings for the active class, a feature which is closely related to molecular size. If this is the case when these coefficients are applied to similarity searches, then we would expect considerable variation in performance when applied to dissimilarity methods, namely clustering and compound selection. Here we report on several studies which have been undertaken to investigate the relative performance of 13 association and correlation coefficients, which have been shown to exhibit complementary performance in similarity searches, when applied to hierarchical and nonhierarchical clustering methods and to a compound selection methodology. Results suggest that the correlation coefficients perform consistently well for clustering and compound selection, as does the Baroni-Urbani/Buser association coefficient. Surprisingly, these often outperform the Tanimoto coefficient, while the Simple Match (effectively the complement of the Squared Euclidean Distance) performs very poorly.
| Item Type: | Article |
|---|---|
| Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Department of Information Studies (Sheffield) |
| ID Code: | 9226 |
| Deposited By: | Information Studies |
| Deposited On: | 26 Aug 2009 10:27 |
| Last Modified: | 26 Aug 2009 10:27 |
| Published Version: | http://dx.doi.org/10.1021/ci700413a |
| Status: | Published |
| Publisher: | American Chemical Society |
| Identification Number: | 10.1021/ci700413a |
Archive Staff Only: edit this record




