Case-control studies of unrelated subjects are now widely used to study the role of genetic susceptibility and gene-environment interactions in the etiology of complex diseases. Exploiting an assumption of gene-environment independence, and treating the distribution of environmental exposures as completely nonparametric, Chatterjee and Carroll  (Biometrika 92:399-418) recently developed an efficient retrospective maximum-likelihood method for analysis of case-control studies. In this article, we develop an extension of the retrospective maximum-likelihood approach to studies where genetic information may be missing on some study subjects. In particular, special emphasis is given to haplotype-based studies where missing data arise due to linkage-phase ambiguity of genotype data. We use a profile likelihood technique and an appropriate expectation-maximization (EM) algorithm to derive a relatively simple procedure for parameter estimation, with or without a rare disease assumption, and possibly incorporating information on the marginal probability of the disease for the underlying population. We also describe two alternative robust approaches that are less sensitive to the underlying gene-environment independence and Hardy-Weinberg-equilibrium assumptions. The performance of the proposed methods is studied using simulation studies in the context of haplotype-based studies of gene-environment interactions. An application of the proposed method is illustrated using a case-control study of ovarian cancer designed to investigate the interaction between BRCA1/2 mutations and reproductive risk factors in the etiology of ovarian cancer.
- Case-control studies
- EM algorithm
- Gene-environment interactions
- Semiparametric methods
ASJC Scopus subject areas