Causal inference in outcome-dependent two-phase sampling designs

Weiwei Wang, Daniel Scharfstein, Zhiqiang Tan, Ellen J. MacKenzie

Research output: Contribution to journalArticlepeer-review

10 Scopus citations


We consider estimation of the causal effect of a treatment on an outcome from observational data collected in two phases. In the first phase, a simple random sample of individuals is drawn from a population. On these individuals, information is obtained on treatment, outcome and a few low dimensional covariates. These individuals are then stratified according to these factors. In the second phase, a random subsample of individuals is drawn from each stratum, with known stratum-specific selection probabilities. On these individuals, a rich set of covariates is collected. In this setting, we introduce five estimators: simple inverse weighted; simple doubly robust; enriched inverse weighted; enriched doubly robust; locally efficient. We evaluate the finite sample performance of these estimators in a simulation study. We also use our methodology to estimate the causal effect of trauma care on in-hospital mortality by using data from the National Study of Cost and Outcomes of Trauma.

Original languageEnglish (US)
Pages (from-to)947-969
Number of pages23
JournalJournal of the Royal Statistical Society. Series B: Statistical Methodology
Issue number5
StatePublished - Nov 2009


  • Doubly robust estimator
  • Outcome-dependent sampling
  • Two-phase sampling

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'Causal inference in outcome-dependent two-phase sampling designs'. Together they form a unique fingerprint.

Cite this