White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

A full Bayesian hierarchical mixture model for the variance of gene differential expression

Manda, S.O.M., Walls, R.E. and Gilthorpe, M.S. (2007) A full Bayesian hierarchical mixture model for the variance of gene differential expression. BMC Bioinformatics, 8 (124). ISSN 1471-2105

Full text available as:
[img]
Preview
Text
gilthorpem2.pdf
Available under License : See the attached licence file.

Download (324Kb)

Abstract

BACKGROUND

In many laboratory-based high throughput microarray experiments, there are very few replicates of gene expression levels. Thus, estimates of gene variances are inaccurate. Visual inspection of graphical summaries of these data usually reveals that heteroscedasticity is present, and the standard approach to address this is to take a log2 transformation. In such circumstances, it is then common to assume that gene variability is constant when an analysis of these data is undertaken. However, this is perhaps too stringent an assumption. More careful inspection reveals that the simple log2 transformation does not remove the problem of heteroscedasticity. An alternative strategy is to assume independent gene-specific variances; although again this is problematic as variance estimates based on few replications are highly unstable. More meaningful and reliable comparisons of gene expression might be achieved, for different conditions or different tissue samples, where the test statistics are based on accurate estimates of gene variability; a crucial step in the identification of differentially expressed genes.

RESULTS

We propose a Bayesian mixture model, which classifies genes according to similarity in their variance. The result is that genes in the same latent class share the similar variance, estimated from a larger number of replicates than purely those per gene, i.e. the total of all replicates of all genes in the same latent class. An example dataset, consisting of 9216 genes with four replicates per condition, resulted in four latent classes based on their similarity of the variance.

CONCLUSION

The mixture variance model provides a realistic and flexible estimate for the variance of gene expression data under limited replicates. We believe that in using the latent class variances, estimated from a larger number of genes in each derived latent group, the p-values obtained are more robust than either using a constant gene or gene-specific variance estimate

Item Type: Article
Copyright, Publisher and Additional Information: © 2007 Manda et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Academic Units: The University of Leeds > Faculty of Maths and Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds)
The University of Leeds > Faculty of Medicine and Health (Leeds) > Leeds Institute of Genetics, Health and Therapeutics (LIGHT) > Biostatistics (Leeds)
The University of Leeds > Faculty of Medicine and Health (Leeds) > School of Medicine (Leeds) > Leeds Institute of Genetics, Health and Therapeutics (LIGHT) > Biostatistics (Leeds)
Depositing User: Sherpa Assistant
Date Deposited: 17 Oct 2008 09:42
Last Modified: 08 Feb 2013 17:05
Published Version: http://dx.doi.org/10.1186/1471-2105-8-124
Status: Published
Publisher: BioMed Central Ltd
Refereed: Yes
Identification Number: 10.1186/1471-2105-8-124
URI: http://eprints.whiterose.ac.uk/id/eprint/4759

Actions (login required)

View Item View Item