TY - JOUR
T1 - Propensity score estimation
T2 - neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression
AU - Westreich, Daniel
AU - Lessler, Justin
AU - Funk, Michele Jonsson
N1 - Funding Information:
Funding: This work was funded in part by a developmental grant from the University of North Carolina at Chapel Hill Center for AIDS Research (CFAR) , an NIH-funded program #P30 AI50410 . D.W. also received support from an unrestricted educational training grant from the UNC-GlaxoSmithKline Center for Excellence in Pharmacoepidemiology and Public Health , UNC School of Public Health , and from NIH/NIAID 5 T32 AI 07001-31 Training in Sexually Transmitted Diseases and AIDS.
PY - 2010/8
Y1 - 2010/8
N2 - Objective: Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting: We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results: We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Conclusion: Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice.
AB - Objective: Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting: We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results: We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Conclusion: Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice.
KW - Classification and regression trees (CART)
KW - Logistic regression
KW - Neural networks
KW - Propensity scores
KW - Recursive partitioning algorithms
KW - Review
UR - http://www.scopus.com/inward/record.url?scp=77953607621&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77953607621&partnerID=8YFLogxK
U2 - 10.1016/j.jclinepi.2009.11.020
DO - 10.1016/j.jclinepi.2009.11.020
M3 - Review article
C2 - 20630332
AN - SCOPUS:77953607621
SN - 0895-4356
VL - 63
SP - 826
EP - 833
JO - Journal of Clinical Epidemiology
JF - Journal of Clinical Epidemiology
IS - 8
ER -