TY - JOUR
T1 - Generalized Hotelling's test for paired compositional data with application to human microbiome studies
AU - Zhao, Ni
AU - Zhan, Xiang
AU - Guthrie, Katherine A.
AU - Mitchell, Caroline M.
AU - Larson, Joseph
N1 - Publisher Copyright:
© 2018 WILEY PERIODICALS, INC.
PY - 2018/7
Y1 - 2018/7
N2 - The human microbiome is a dynamic system that changes due to diseases, medication, change in diet, etc. The paired design is a common approach to evaluate the microbial changes while controlling for the inherent differences between people. For example, microbiome data may be collected from the same individuals before and after a treatment. Two challenges exist in analyzing this type of data. First, microbiome data are compositional such that the reads for all taxa in each sample are constrained to sum to a constant. Second, the number of taxa can be much larger than the sample size. Few statistical methods exist to analyze such data besides methods that test one taxon at a time. In this paper, we propose to first conduct a log-ratio transformation of the compositions, and then develop a generalized Hotelling's test (GHT) to evaluate whether the average microbiome compositions are equivalent in the paired samples. We replace the sample covariance matrix in standard Hotelling's statistic by a shrinkage-based covariance, calculated as a weighted average of the sample covariance and a positive definite target matrix. The optimal weighting can be obtained for many commonly used target matrices. We develop a permutation procedure to assess the statistical significance. Extensive simulations show that our proposed method has well-controlled type I error and better power than a few ad hoc approaches. We apply our method to examine the vaginal microbiome changes in response to treatments for menopausal hot flashes. An R package “ GHT” is freely available at https://github.com/zhaoni153/GHT.
AB - The human microbiome is a dynamic system that changes due to diseases, medication, change in diet, etc. The paired design is a common approach to evaluate the microbial changes while controlling for the inherent differences between people. For example, microbiome data may be collected from the same individuals before and after a treatment. Two challenges exist in analyzing this type of data. First, microbiome data are compositional such that the reads for all taxa in each sample are constrained to sum to a constant. Second, the number of taxa can be much larger than the sample size. Few statistical methods exist to analyze such data besides methods that test one taxon at a time. In this paper, we propose to first conduct a log-ratio transformation of the compositions, and then develop a generalized Hotelling's test (GHT) to evaluate whether the average microbiome compositions are equivalent in the paired samples. We replace the sample covariance matrix in standard Hotelling's statistic by a shrinkage-based covariance, calculated as a weighted average of the sample covariance and a positive definite target matrix. The optimal weighting can be obtained for many commonly used target matrices. We develop a permutation procedure to assess the statistical significance. Extensive simulations show that our proposed method has well-controlled type I error and better power than a few ad hoc approaches. We apply our method to examine the vaginal microbiome changes in response to treatments for menopausal hot flashes. An R package “ GHT” is freely available at https://github.com/zhaoni153/GHT.
KW - Hotelling's test
KW - compositional data
KW - microbiome
KW - shrinkage-based covariance
UR - http://www.scopus.com/inward/record.url?scp=85046480473&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046480473&partnerID=8YFLogxK
U2 - 10.1002/gepi.22127
DO - 10.1002/gepi.22127
M3 - Article
C2 - 29737047
AN - SCOPUS:85046480473
SN - 0741-0395
VL - 42
SP - 459
EP - 469
JO - Genetic epidemiology
JF - Genetic epidemiology
IS - 5
ER -