TY - JOUR
T1 - Personalized Risk Prediction in Clinical Oncology Research
T2 - Applications and Practical Issues Using Survival Trees and Random Forests
AU - Hu, Chen
AU - Steingrimsson, Jon Arni
N1 - Funding Information:
This research is supported in part by National Institutes of Health grants U10-CA180822 and P30-CA006973.
Funding Information:
This research is supported in part by National Institutes of Health grants U10-CA180822 and P30-CA006973. The authors thank Xiaofei Wang and Marlina Nasution, the guesteditors for the Special Issue, and two anonymous referees who all helped to improve the article. This research is supported in part by National Institutes of Health grants U10-CA180822 and P30-CA006973.
Publisher Copyright:
© 2017 Taylor & Francis.
PY - 2018/3/4
Y1 - 2018/3/4
N2 - A crucial component of making individualized treatment decisions is to accurately predict each patient’s disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
AB - A crucial component of making individualized treatment decisions is to accurately predict each patient’s disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
KW - CART
KW - Cancer
KW - risk prediction
KW - survival analysis
KW - survival forests
KW - survival trees
UR - http://www.scopus.com/inward/record.url?scp=85031772328&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85031772328&partnerID=8YFLogxK
U2 - 10.1080/10543406.2017.1377730
DO - 10.1080/10543406.2017.1377730
M3 - Article
C2 - 29048993
AN - SCOPUS:85031772328
SN - 1054-3406
VL - 28
SP - 333
EP - 349
JO - Journal of biopharmaceutical statistics
JF - Journal of biopharmaceutical statistics
IS - 2
ER -