Asymptotic Conditional Singular Value Decomposition for High-Dimensional Genomic Data

Research output: Contribution to journalArticlepeer-review

30 Scopus citations


High-dimensional data, such as those obtained from a gene expression microarray or second generation sequencing experiment, consist of a large number of dependent features measured on a small number of samples. One of the key problems in genomics is the identification and estimation of factors that associate with many features simultaneously. Identifying the number of factors is also important for unsupervised statistical analyses such as hierarchical clustering. A conditional factor model is the most common model for many types of genomic data, ranging from gene expression, to single nucleotide polymorphisms, to methylation. Here we show that under a conditional factor model for genomic data with a fixed sample size, the right singular vectors are asymptotically consistent for the unobserved latent factors as the number of features diverges. We also propose a consistent estimator of the dimension of the underlying conditional factor model for a finite fixed sample size and an infinite number of features based on a scaled eigen-decomposition. We propose a practical approach for selection of the number of factors in real data sets, and we illustrate the utility of these results for capturing batch and other unmodeled effects in a microarray experiment using the dependence kernel approach of Leek and Storey (2008,, 18718-18723).

Original languageEnglish (US)
Pages (from-to)344-352
Number of pages9
Issue number2
StatePublished - Jun 2011


  • False discovery rate
  • Gene expression
  • Genomics
  • High-dimensional
  • Singular value decomposition
  • Surrogate variables

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics


Dive into the research topics of 'Asymptotic Conditional Singular Value Decomposition for High-Dimensional Genomic Data'. Together they form a unique fingerprint.

Cite this