TY - JOUR
T1 - Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies
AU - Chatterjee, Nilanjan
AU - Wheeler, Bill
AU - Sampson, Joshua
AU - Hartge, Patricia
AU - Chanock, Stephen J.
AU - Park, Ju Hyun
N1 - Funding Information:
This research was supported by the intramural program of the US National Cancer Institute.
PY - 2013/4
Y1 - 2013/4
N2 - We report a new method to estimate the predictive performance of polygenic models for risk prediction and assess predictive performance for ten complex traits or common diseases. Using estimates of effect-size distribution and heritability derived from current studies, we project that although 45% of the variance of height has been attributed to SNPs, a model trained on one million people may only explain 33.4% of variance of the trait. Models based on current studies allow for identification of 3.0%, 1.1% and 7.0% of the populations at twofold or higher than average risk for type 2 diabetes, coronary artery disease and prostate cancer, respectively. Tripling of sample sizes could elevate these percentages to 18.8%, 6.1% and 12.2%, respectively. The utility of polygenic models for risk prediction will depend on achievable sample sizes for the training data set, the underlying genetic architecture and the inclusion of information on other risk factors, including family history.
AB - We report a new method to estimate the predictive performance of polygenic models for risk prediction and assess predictive performance for ten complex traits or common diseases. Using estimates of effect-size distribution and heritability derived from current studies, we project that although 45% of the variance of height has been attributed to SNPs, a model trained on one million people may only explain 33.4% of variance of the trait. Models based on current studies allow for identification of 3.0%, 1.1% and 7.0% of the populations at twofold or higher than average risk for type 2 diabetes, coronary artery disease and prostate cancer, respectively. Tripling of sample sizes could elevate these percentages to 18.8%, 6.1% and 12.2%, respectively. The utility of polygenic models for risk prediction will depend on achievable sample sizes for the training data set, the underlying genetic architecture and the inclusion of information on other risk factors, including family history.
UR - http://www.scopus.com/inward/record.url?scp=84875700256&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84875700256&partnerID=8YFLogxK
U2 - 10.1038/ng.2579
DO - 10.1038/ng.2579
M3 - Article
C2 - 23455638
AN - SCOPUS:84875700256
SN - 1061-4036
VL - 45
SP - 400
EP - 405
JO - Nature Genetics
JF - Nature Genetics
IS - 4
ER -