Holliday, J.D., Salim, N., Whittle, M. and Willett, P. (2003) Analysis and display of the size dependence of chemical similarity coefficients. Journal of Chemical Information and Computer Sciences, 43 (3). pp. 819-828. ISSN 0095-2338
We discuss the size-bias inherent in several chemical similarity coefficients when used for the similarity searching or diversity selection of compound collections. Limits to the upper bounds of 14 standard similarity coefficients are investigated, and the results are used to identify some exceptional characteristics of a few of the coefficients. An additional numerical contribution to the known size bias in the Tanimoto coefficient is identified. Graphical plots with respect to relative bit density are introduced to further assess the coefficients. Our methods reveal the asymmetries inherent in most similarity coefficients that lead to bias in selection, most notably with the Forbes and Russell-Rao coefficients. Conversely, when applied to the recently introduced Modified Tanimoto coefficient our methods provide support for the view that it is less biased toward molecular size than most. In this work we focus our discussion on fragment-based bit strings, but we demonstrate how our approach can be generalized to continuous representations.
|Institution:||The University of Sheffield|
|Academic Units:||The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)|
|Depositing User:||Information Studies|
|Date Deposited:||26 Aug 2009 09:50|
|Last Modified:||26 Aug 2009 09:50|
|Publisher:||American Chemical Society|