White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

A Classification Updating Procedure Motivated by High Content Screening Data

Jacques, R.M., Fieller, N.R.J. and Ainscow, E.K. (2012) A Classification Updating Procedure Motivated by High Content Screening Data. Journal of Applied Statistics, 39 (1). pp. 189-198. ISSN 0266-4763


Download (152Kb)


The current paradigm for the identification of candidate drugs within the pharmaceutical industry typically involves the use of high throughput screens (HTS). High content screening (HCS) is the term given to the process of using an imaging platform to screen large numbers of compounds for some desirable biological activity. Classification methods have important applications in high content screening experiments where they are used to predict which compounds have the potential to be developed into new drugs. In this paper a new classification method is proposed for batches of compounds where the rule is updated sequentially using information from the classification of previous batches. This methodology accounts for the possibility that the training data are not a representative sample of the test data and that the underlying group distributions may change as new compounds are analysed. This technique is illustrated on an example data set using linear discriminant analysis, k-nearest neighbour and random forest classifiers. Random Forests are shown to be superior to the other classifiers and are further improved by the additional updating algorithm in terms of an increase in the number of true positives as well as decreasing the number of false positives.

Item Type: Article
Copyright, Publisher and Additional Information: This is a preprint of an article whose final and definitive form has been published in the Journal of Applied Statistics © Taylor & Francis, 2011. This is the author's version of the work. It is posted here by permission of 'Taylor & Francis' for personal use, not for redistribution. The definitive version was published in the Journal of Applied Statistics 2012; 1, 189-198. doi:10.1080/02664763.2011.580335 (http://dx.doi.org/10.1080/02664763.2011.580335)
Keywords: Classification, Updating Algorithm, High Content Screening Experiments, Batch Learning, Random Forests, Linear Discriminant Analysis, K-Nearest Neighbour
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Medicine, Dentistry and Health (Sheffield) > School of Health and Related Research (Sheffield)
The University of Sheffield > Faculty of Science (Sheffield) > School of Mathematics and Statistics (Sheffield)
Depositing User: Dr Richard Jacques
Date Deposited: 13 Apr 2012 10:41
Last Modified: 16 Nov 2015 11:49
Published Version: http://dx.doi.org/10.1080/02664763.2011.580335
Status: Published
Publisher: Taylor & Francis
Refereed: Yes
Identification Number: 10.1080/02664763.2011.580335
URI: http://eprints.whiterose.ac.uk/id/eprint/43846

Actions (repository staff only: login required)