Al Khalifa, A., Haranczyk, M. and Holliday, J.D. (2009) Comparison of nonbinary similarity coefficients for similarity searching, clustering and compound selection. Journal of Chemical Information and Modeling, 49 (5). pp. 1193-1201. ISSN 1549-9596
Several recent studies have compared the relative performance of a selection of similarity coefficients when applied to chemical databases represented by binary fingerprints. Considerable variation in performance, when used for (dis)similarity-based techniques, such as similarity searching, database clustering, and dissimilarity-based compound selection, has been reported, the reasons for which are closely related to molecular size. For many of these similarity coefficients, an alternative form can be derived which is applicable to sets of nonbinary data, such as calculated or measured physicochemical properties, or counts of substructural fragments. Here we report on several studies which have been undertaken to investigate the relative performance of twelve coefficients when applied to nonbinary data using such (dis)similarity-based techniques. Results suggest that no single coefficient is appropriate for all methodologies investigated and that the size bias detected with binary data is not as apparent when the data and, hence, coefficient are nonbinary in nature.
|Institution:||The University of Sheffield|
|Academic Units:||The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)|
|Depositing User:||Information Studies|
|Date Deposited:||25 Aug 2009 14:53|
|Last Modified:||25 Aug 2009 14:53|
|Publisher:||American Chemical Society|