TY - JOUR
T1 - Power Analysis and Sample Size Determination in Metabolic Phenotyping
AU - Blaise, Benjamin J.
AU - Correia, Gonçalo
AU - Tin, Adrienne
AU - Young, J. Hunter
AU - Vergnaud, Anne Claire
AU - Lewis, Matthew
AU - Pearce, Jake T.M.
AU - Elliott, Paul
AU - Nicholson, Jeremy K.
AU - Holmes, Elaine
AU - Ebbels, Timothy M.D.
N1 - Funding Information:
B.J.B. is supported by the Fédération pour la Recherche Médicale, the Société Française dAnesthésie et Réanimation, the Académie Nationale de Médecine, the Association de Néonatologie de Port-Royal, and the City of Lyon. G.C. is supported by the Imperial College Stratified Medicine Graduate Training Programme in Systems Medicine and Spectroscopic Profiling (STRATiGRAD). T.E. acknowledges support from the EU COSMOS project (Grant Agreement 312941) and the EU PhenoMeNal project (Project Reference 654241). We thank Drs. Bénédicte Elena-Herrmann, Jean Giacomotto, Laurent Ségalat, Pierre Toulhoat, and Prof. Lyndon Emsley for providing the C. elegans data set. We would like to thank Bruker BioSpin, GmbH, Rheinstetten, Germany, for metabolite quantification. We thank Dr. Anthony Dona and Dóra Perényi for acquiring the ARIC NMR data. This research has been conducted using the Airwave Study RTB Resource. The Airwave Health Monitoring Study is funded by the Home Office (Grant Number 780-TETRA) with additional support from the National Institute for Health Research (NIHR), Imperial College Healthcare NHS Trust (ICHNT), and Imperial College Biomedical Research Centre (BRC). We thank all participants in the Airwave Study for their contribution. We also thank the Institute for translational medicine and therapeutics (ITMAT). P.E. acknowledges support of the MRC-PHE Centre for Environment and Health, the BRC, and the NIHR Health Protection Research Unit for Health Impacts of Environmental Hazards. The views expressed are those of the authors and not necessarily those of the Home Office, the Department of Health, or the NIHR. This work was supported by the Medical Research Council and NIHR (grant MC-PC-12025).
Publisher Copyright:
© 2016 American Chemical Society.
PY - 2016/5/17
Y1 - 2016/5/17
N2 - Estimation of statistical power and sample size is a key aspect of experimental design. However, in metabolic phenotyping, there is currently no accepted approach for these tasks, in large part due to the unknown nature of the expected effect. In such hypothesis free science, neither the number or class of important analytes nor the effect size are known a priori. We introduce a new approach, based on multivariate simulation, which deals effectively with the highly correlated structure and high-dimensionality of metabolic phenotyping data. First, a large data set is simulated based on the characteristics of a pilot study investigating a given biomedical issue. An effect of a given size, corresponding either to a discrete (classification) or continuous (regression) outcome is then added. Different sample sizes are modeled by randomly selecting data sets of various sizes from the simulated data. We investigate different methods for effect detection, including univariate and multivariate techniques. Our framework allows us to investigate the complex relationship between sample size, power, and effect size for real multivariate data sets. For instance, we demonstrate for an example pilot data set that certain features achieve a power of 0.8 for a sample size of 20 samples or that a cross-validated predictivity QY2 of 0.8 is reached with an effect size of 0.2 and 200 samples. We exemplify the approach for both nuclear magnetic resonance and liquid chromatography-mass spectrometry data from humans and the model organism C. elegans.
AB - Estimation of statistical power and sample size is a key aspect of experimental design. However, in metabolic phenotyping, there is currently no accepted approach for these tasks, in large part due to the unknown nature of the expected effect. In such hypothesis free science, neither the number or class of important analytes nor the effect size are known a priori. We introduce a new approach, based on multivariate simulation, which deals effectively with the highly correlated structure and high-dimensionality of metabolic phenotyping data. First, a large data set is simulated based on the characteristics of a pilot study investigating a given biomedical issue. An effect of a given size, corresponding either to a discrete (classification) or continuous (regression) outcome is then added. Different sample sizes are modeled by randomly selecting data sets of various sizes from the simulated data. We investigate different methods for effect detection, including univariate and multivariate techniques. Our framework allows us to investigate the complex relationship between sample size, power, and effect size for real multivariate data sets. For instance, we demonstrate for an example pilot data set that certain features achieve a power of 0.8 for a sample size of 20 samples or that a cross-validated predictivity QY2 of 0.8 is reached with an effect size of 0.2 and 200 samples. We exemplify the approach for both nuclear magnetic resonance and liquid chromatography-mass spectrometry data from humans and the model organism C. elegans.
UR - http://www.scopus.com/inward/record.url?scp=84971216533&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84971216533&partnerID=8YFLogxK
U2 - 10.1021/acs.analchem.6b00188
DO - 10.1021/acs.analchem.6b00188
M3 - Article
C2 - 27116637
AN - SCOPUS:84971216533
SN - 0003-2700
VL - 88
SP - 5179
EP - 5188
JO - Analytical Chemistry
JF - Analytical Chemistry
IS - 10
ER -