Generalization of DNA microarray dispersion properties : microarray equivalent of t-distribution
Novak, Jaroslav P. (Genome Québec Innovation Centre (Montreal, Canadà))
Kim, Seon-Young (Genome Research Center. Human Genomics Laboratory (Daejon, Corea))
Xu, Jun (Cedars-Sinai Medical Center (Los Angeles, Estats Units d'Amèrica))
Modlich, Olga (Institut fur Onkologische Chemie (Dusseldorf, Alemanya))
Volsky, David J. (Columbia University. Molecular Virology Division (Nova York, Estats Units d'Amèrica))
Honys, David (Institute of Experimental Botany (Praga, República Txeca))
Slonczewski, Joan L. (Kenyon College. Department of Biology (Gambier, Estats Units d'Amèrica))
Bell, Douglas A. (National Institute of Environmental Health Sciences (Estats Units d'Amèrica))
Blattner, Fred R. (University of Wisconsin. Department of Genetics (Madison, Estats Units d'Amèrica))
Blumwald, Eduardo (University of California. Department of Plant Sciences (Davis, Estats Units d'Amèrica))
Boerma, Marjan (University of Arkansas for Medical Sciences. Department of Pharmaceutical Sciences)
Cosio, Manuel (McGill University. Department of Medicine (Montreal, Canadà))
Gatalica, Zoran (Creighton University School of Medicine. Department of Pathology (Omaha, Estats Units d'Amèrica))
Hajduch, Marian (Palacky University in Olomouc. Department of Pediatrics (República Txeca))
Hidalgo Pareja, Juan (Universitat Autònoma de Barcelona. Departament de Biologia Cel·lular, de Fisiologia i d'Immunologia)
McInnes, Roderick R. (University of Toronto. Departments of Molecular and Medical Genetics and Pediatrics (Toronto, Canadà))
Miller, Merrill C. (National Institute of Environmental Health Sciences (Estats Units d'Amèrica))
Penkowa, Milena (University of Copenhagen. Section of Neuroprotection)
Rolph, Michael S. (Garvan Institute of Medical Research (Darlinghurst, Austràlia))
Sottosanto, Jordan (University of California. Department of Plant Sciences)
St-Arnaud, Rene (McGill University. Departments of Surgery and Human Genetics (Montreal, Canadà))
Szego, Michael J. (University of Toronto. Departments of Molecular and Medical Genetics and Pediatrics (Toronto, Canadà))
Twell, David (University of Leicester. Department of Biology (Regne Unit))
Wang, Charles (David Geffen School of Medicine (Los Angeles, Estats Units d'Amèrica))

Data: 2006
Resum: Background: DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. Results: Here we examine the expression data obtained from 682 Affymetrix GeneChips® with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. Conclusion: In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Kα coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Kα distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance.
Drets: Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades. Creative Commons
Llengua: Anglès
Document: Article ; Versió publicada
Publicat a: Biology direct, Vol. 1, N. 27 (September 2006) , p. 1-24, ISSN 1745-6150

DOI: 10.1186/1745-6150-1-27
PMID: 16959036

24 p, 1.3 MB

