Purpose: Recent studies sought to refine lung cancer classification using gene expression microarrays. We evaluate the extent to which these studies agree and whether results can be integrated. Experimental Design: We developed a practical analysis plan for cross-study comparison, validation, and integration of cancer molecular classification studies using public data. We evaluated genes for cross-platform consistency of expression patterns, using integrative correlations, which quantify cross-study reproducibility without relying on direct assimilation of expression measurements across platforms. We then compared associations of gene expression levels to differential diagnosis of squamous cell carcinoma versus adenocarcinoma via reproducibility of the gene-specific t statistics and to survival via reproducibility of Cox coefficients. Results: Integrative correlation analysis revealed a large proportion of genes in which the patterns agreed across studies more than would be expected by chance. Correlation of t statistics for diagnosis of squamous cell carcinoma versus adenocarcinoma is high (0.85) and increases (0.925) when using only the most consistent genes identified by integrative correlation. Correlations of Cox coefficients ranged from 0.13 to 0.31 (0.33-0.49 with genes selected for consistency). Although we find genes that are significant in multiple studies but show discordant effects, their number is approximately that expected by chance. We report genes that are reproducible by integrative analysis, significant in all studies, and concordant in effect. Conclusions: Cross-study comparison revealed significant, albeit incomplete, agreement of gene expression patterns related to lung cancer biology and identified genes that reproducibly predict outcomes. This analysis approach is broadly applicable to cross-study comparisons of gene expression profiling projects.
ASJC Scopus subject areas
- Cancer Research