Abstract
We present a family of goodness-of-fit (GOF) test statistics specifically designed for detection of sparse-weak mixtures, where only a small fraction of the observational units are contaminated arising from a different distribution. The test statistics are constructed as sup-functionals of weighted empirical processes where the weight functions employed are the Chibisov-O'Reilly functions of a Brownian bridge. The study recovers and extends a number of previously known results on sparse detection using a weighted GOF (wGOF) approach. In particular, the results obtained demonstrate the advantage of our approach over a common approach that utilizes a family of regularly varying weight functions. We show that the Chibisov-O'Reilly family has important advantages over better known approaches as it allows for optimally adaptive, fully data-driven test procedures. The theory is further developed to demonstrate that the entire family is a flexible device that adapts to many interesting situations of modern scientific practice where the number of observations stays fixed or grows very slowly while the number of automatically measured features grows dramatically and only a small fraction of these features are useful. Numerical studies are performed to investigate the finite sample properties of the theoretical results. We shown that the Chibisov-O'Reilly family compares favorably to related test statistics over a broad range of sparsity and weakness regimes for the Gaussian and high-dimensional Dirichlet types of sparse mixture. Finally, an example of human gut microbiome data set is presented to illustrate that the family of tests has found applications in real-life sparse signal detection problems where the sample size is small in relation to the features dimension.
Original language | English (US) |
---|---|
Title of host publication | Recent Developments in Multivariate and Random Matrix Analysis |
Subtitle of host publication | Festschrift in Honour of Dietrich von Rosen |
Publisher | Springer International Publishing |
Pages | 287-311 |
Number of pages | 25 |
ISBN (Electronic) | 9783030567736 |
ISBN (Print) | 9783030567729 |
DOIs | |
State | Published - Sep 17 2020 |
Externally published | Yes |
ASJC Scopus subject areas
- General Mathematics
- General Medicine
- Economics, Econometrics and Finance(all)
- General Business, Management and Accounting