Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?

Robert B. Olsen; Larry L. Orr; Stephen H. Bell; Elizabeth Petraglia; Elena Badillo-Goicoechea; Atsushi Miyaoka; Elizabeth A. Stuart

doi:10.1080/19345747.2023.2180464

Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?

Robert B. Olsen, Larry L. Orr, Stephen H. Bell, Elizabeth Petraglia, Elena Badillo-Goicoechea, Atsushi Miyaoka, Elizabeth A. Stuart

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

Abstract

Multi-site randomized controlled trials (RCTs) provide unbiased estimates of the average impact in the study sample. However, their ability to accurately predict the impact for individual sites outside the study sample, to inform local policy decisions, is largely unknown. To extend prior research on this question, we analyzed six multi-site RCTs and tested modern prediction methods—lasso regression and Bayesian Additive Regression Trees (BART)—using a wide range of moderator variables. The main study findings are that: (1) all of the methods yielded accurate impact predictions when the variation in impacts across sites was close to zero (as expected); (2) none of the methods yielded accurate impact predictions when the variation in impacts across sites was substantial; and (3) BART typically produced “less inaccurate” predictions than lasso regression or than the Sample Average Treatment Effect. These results raise concerns that when the impact of an intervention varies considerably across sites, statistical modeling using the data commonly collected by multi-site RCTs will be insufficient to explain the variation in impacts across sites and accurately predict impacts for individual sites.

Original language	English (US)
Pages (from-to)	184-210
Number of pages	27
Journal	Journal of Research on Educational Effectiveness
Volume	17
Issue number	1
DOIs	https://doi.org/10.1080/19345747.2023.2180464
State	Published - 2024

Keywords

Randomized controlled trials
evidence-based policy
external validity
generalizability
transportability

ASJC Scopus subject areas

Education

Access to Document

10.1080/19345747.2023.2180464

Cite this

@article{3978ebb8fb8041dd8d50a8f9b19193f1,

title = "Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?",

abstract = "Multi-site randomized controlled trials (RCTs) provide unbiased estimates of the average impact in the study sample. However, their ability to accurately predict the impact for individual sites outside the study sample, to inform local policy decisions, is largely unknown. To extend prior research on this question, we analyzed six multi-site RCTs and tested modern prediction methods—lasso regression and Bayesian Additive Regression Trees (BART)—using a wide range of moderator variables. The main study findings are that: (1) all of the methods yielded accurate impact predictions when the variation in impacts across sites was close to zero (as expected); (2) none of the methods yielded accurate impact predictions when the variation in impacts across sites was substantial; and (3) BART typically produced “less inaccurate” predictions than lasso regression or than the Sample Average Treatment Effect. These results raise concerns that when the impact of an intervention varies considerably across sites, statistical modeling using the data commonly collected by multi-site RCTs will be insufficient to explain the variation in impacts across sites and accurately predict impacts for individual sites.",

keywords = "Randomized controlled trials, evidence-based policy, external validity, generalizability, transportability",

author = "Olsen, {Robert B.} and Orr, {Larry L.} and Bell, {Stephen H.} and Elizabeth Petraglia and Elena Badillo-Goicoechea and Atsushi Miyaoka and Stuart, {Elizabeth A.}",

note = "Publisher Copyright: {\textcopyright} 2023 Westat.",

year = "2024",

doi = "10.1080/19345747.2023.2180464",

language = "English (US)",

volume = "17",

pages = "184--210",

journal = "Journal of Research on Educational Effectiveness",

issn = "1934-5747",

publisher = "Routledge",

number = "1",

}

TY - JOUR

T1 - Using a Multi-Site RCT to Predict Impacts for a Single Site

T2 - Do Better Data and Methods Yield More Accurate Predictions?

AU - Olsen, Robert B.

AU - Orr, Larry L.

AU - Bell, Stephen H.

AU - Petraglia, Elizabeth

AU - Badillo-Goicoechea, Elena

AU - Miyaoka, Atsushi

AU - Stuart, Elizabeth A.

PY - 2024

Y1 - 2024

N2 - Multi-site randomized controlled trials (RCTs) provide unbiased estimates of the average impact in the study sample. However, their ability to accurately predict the impact for individual sites outside the study sample, to inform local policy decisions, is largely unknown. To extend prior research on this question, we analyzed six multi-site RCTs and tested modern prediction methods—lasso regression and Bayesian Additive Regression Trees (BART)—using a wide range of moderator variables. The main study findings are that: (1) all of the methods yielded accurate impact predictions when the variation in impacts across sites was close to zero (as expected); (2) none of the methods yielded accurate impact predictions when the variation in impacts across sites was substantial; and (3) BART typically produced “less inaccurate” predictions than lasso regression or than the Sample Average Treatment Effect. These results raise concerns that when the impact of an intervention varies considerably across sites, statistical modeling using the data commonly collected by multi-site RCTs will be insufficient to explain the variation in impacts across sites and accurately predict impacts for individual sites.

AB - Multi-site randomized controlled trials (RCTs) provide unbiased estimates of the average impact in the study sample. However, their ability to accurately predict the impact for individual sites outside the study sample, to inform local policy decisions, is largely unknown. To extend prior research on this question, we analyzed six multi-site RCTs and tested modern prediction methods—lasso regression and Bayesian Additive Regression Trees (BART)—using a wide range of moderator variables. The main study findings are that: (1) all of the methods yielded accurate impact predictions when the variation in impacts across sites was close to zero (as expected); (2) none of the methods yielded accurate impact predictions when the variation in impacts across sites was substantial; and (3) BART typically produced “less inaccurate” predictions than lasso regression or than the Sample Average Treatment Effect. These results raise concerns that when the impact of an intervention varies considerably across sites, statistical modeling using the data commonly collected by multi-site RCTs will be insufficient to explain the variation in impacts across sites and accurately predict impacts for individual sites.

KW - Randomized controlled trials

KW - evidence-based policy

KW - external validity

KW - generalizability

KW - transportability

UR - http://www.scopus.com/inward/record.url?scp=85152928361&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85152928361&partnerID=8YFLogxK

U2 - 10.1080/19345747.2023.2180464

DO - 10.1080/19345747.2023.2180464

M3 - Article

AN - SCOPUS:85152928361

SN - 1934-5747

VL - 17

SP - 184

EP - 210

JO - Journal of Research on Educational Effectiveness

JF - Journal of Research on Educational Effectiveness

IS - 1

ER -

Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this