A pseudoscore estimator for regression problems with two-phase sampling

Nilanjan Chatterjee, Yi Hau Chen, Norman E. Breslow

Research output: Contribution to journalArticlepeer-review

85 Scopus citations

Abstract

Two-phase stratified sampling designs yield efficient estimates of population parameters in regression models while minimizing the costs of data collection. In measurement error problems, for example, error-free covariates are ascertained only for units selected in a validation sample. Estimators proposed heretofore for such designs require all units to have positive probability of being selected. We describe a new semiparametric estimator that relaxes this assumption and that is applicable to, for example, case-only or control-only validation sampling for binary regression problems. It uses a weighted empirical covariate distribution, with weights determined by the regression model, to estimate the score equations. Implementation is relatively easy for both discrete and continuous outcome data. For designs that are amenable to alternative methods, simulation studies show that the new estimator outperforms the currently available weighted and pseudolikelihood methods and often achieves efficiency comparable to that of semiparametric maximum likelihood. The simulations also demonstrate the vulnerability of the case-only or control-only designs to model misspecification. These results are illustrated by the analysis of data from a population-based case-control study of leprosy.

Original languageEnglish (US)
Pages (from-to)158-168
Number of pages11
JournalJournal of the American Statistical Association
Volume98
Issue number461
DOIs
StatePublished - Mar 1 2003
Externally publishedYes

Keywords

  • Measurement error
  • Missing data
  • Pseudolikelihood
  • Response selective sampling
  • Restricted sampling
  • Semiparametric inference

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'A pseudoscore estimator for regression problems with two-phase sampling'. Together they form a unique fingerprint.

Cite this