Detection of sparse and weak effects in high-dimensional feature space, with an application to microbiome data analysis

Tatjana Pavlenko, Annika Tillander, Justine Debelius, Fredrik Boulund

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

We present a family of goodness-of-fit (GOF) test statistics specifically designed for detection of sparse-weak mixtures, where only a small fraction of the observational units are contaminated arising from a different distribution. The test statistics are constructed as sup-functionals of weighted empirical processes where the weight functions employed are the Chibisov-O'Reilly functions of a Brownian bridge. The study recovers and extends a number of previously known results on sparse detection using a weighted GOF (wGOF) approach. In particular, the results obtained demonstrate the advantage of our approach over a common approach that utilizes a family of regularly varying weight functions. We show that the Chibisov-O'Reilly family has important advantages over better known approaches as it allows for optimally adaptive, fully data-driven test procedures. The theory is further developed to demonstrate that the entire family is a flexible device that adapts to many interesting situations of modern scientific practice where the number of observations stays fixed or grows very slowly while the number of automatically measured features grows dramatically and only a small fraction of these features are useful. Numerical studies are performed to investigate the finite sample properties of the theoretical results. We shown that the Chibisov-O'Reilly family compares favorably to related test statistics over a broad range of sparsity and weakness regimes for the Gaussian and high-dimensional Dirichlet types of sparse mixture. Finally, an example of human gut microbiome data set is presented to illustrate that the family of tests has found applications in real-life sparse signal detection problems where the sample size is small in relation to the features dimension.

Original languageEnglish (US)
Title of host publicationRecent Developments in Multivariate and Random Matrix Analysis
Subtitle of host publicationFestschrift in Honour of Dietrich von Rosen
PublisherSpringer International Publishing
Pages287-311
Number of pages25
ISBN (Electronic)9783030567736
ISBN (Print)9783030567729
DOIs
StatePublished - Sep 17 2020
Externally publishedYes

ASJC Scopus subject areas

  • General Mathematics
  • General Medicine
  • Economics, Econometrics and Finance(all)
  • General Business, Management and Accounting

Fingerprint

Dive into the research topics of 'Detection of sparse and weak effects in high-dimensional feature space, with an application to microbiome data analysis'. Together they form a unique fingerprint.

Cite this