TY - JOUR
T1 - Cross-study validation and combined analysis of gene expression microarray data
AU - Garrett-Mayer, Elizabeth
AU - Parmigiani, Giovanni
AU - Zhong, Xiaogang
AU - Cope, Leslie
AU - Gabrielson, Edward
PY - 2008/4
Y1 - 2008/4
N2 - Investigations of transcript levels on a genomic scale using hybridization-based arrays have led to formidable advances in our understanding of the biology of many human illnesses. At the same time, these investigations have generated controversy because of the probabilistic nature of the conclusions and the surfacing of noticeable discrepancies between the results of studies addressing the same biological question. In this article, we present simple and effective data analysis and visualization tools for gauging the degree to which the findings of one study are reproduced by others and for integrating multiple studies in a single analysis. We describe these approaches in the context of studies of breast cancer and illustrate that it is possible to identify a substantial biologically relevant subset of the human genome within which hybridization results are reliable. The subset generally varies with the platforms used, the tissues studied, and the populations being sampled. Despite important differences, it is also possible to develop simple expression measures that allow comparison across platforms, studies, laboratories and populations. Important biological signals are often preserved or enhanced. Cross-study validation and combination of microarray results requires careful, but not overly complex, statistical thinking and can become a routine component of genomic analysis.
AB - Investigations of transcript levels on a genomic scale using hybridization-based arrays have led to formidable advances in our understanding of the biology of many human illnesses. At the same time, these investigations have generated controversy because of the probabilistic nature of the conclusions and the surfacing of noticeable discrepancies between the results of studies addressing the same biological question. In this article, we present simple and effective data analysis and visualization tools for gauging the degree to which the findings of one study are reproduced by others and for integrating multiple studies in a single analysis. We describe these approaches in the context of studies of breast cancer and illustrate that it is possible to identify a substantial biologically relevant subset of the human genome within which hybridization results are reliable. The subset generally varies with the platforms used, the tissues studied, and the populations being sampled. Despite important differences, it is also possible to develop simple expression measures that allow comparison across platforms, studies, laboratories and populations. Important biological signals are often preserved or enhanced. Cross-study validation and combination of microarray results requires careful, but not overly complex, statistical thinking and can become a routine component of genomic analysis.
KW - Breast cancer
KW - Intraclass correlation
KW - Meta-analysis
KW - Prinicipal components
KW - Reliability
UR - http://www.scopus.com/inward/record.url?scp=41149104381&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=41149104381&partnerID=8YFLogxK
U2 - 10.1093/biostatistics/kxm033
DO - 10.1093/biostatistics/kxm033
M3 - Article
C2 - 17873151
AN - SCOPUS:41149104381
SN - 1465-4644
VL - 9
SP - 333
EP - 354
JO - Biostatistics
JF - Biostatistics
IS - 2
ER -