p values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate

Steven N. Goodman

Research output: Contribution to journalArticlepeer-review

195 Scopus citations

Abstract

It is not generally appreciated that the p value, as conceived by R. A. Fisher, is not compatible with the Neyman-Pearson hypothesis test in which it has become embedded. The p value was meant to be a flexible inferential measure, whereas the hypothesis test was a rule for behavior, not inference. The combination of the two methods has led to a reinterpretation of the p value simultaneously as an "observed error rate" and as a measure of evidence. Both of these interpretations are problematic, and their combination has obscured the important differences between Neyman and Fisher on the nature of the scientific method and inhibited our understanding of the philosophic implications of the basic methods in use today. An analysis using another method promoted by Fisher, mathematical likelihood, shows that the p value substantially overstates the evidence against the null hypothesis. Likelihood makes clearer the distinction between error rates and inferential evidence and is a quantitative tool for expressing evidential strength that is more appropriate for the purposes of epidemiology than the p value.

Original languageEnglish (US)
Pages (from-to)485-496
Number of pages12
JournalAmerican Journal of Epidemiology
Volume137
Issue number5
StatePublished - Mar 1 1993

Keywords

  • Hypothesis tests
  • Inference
  • Likelihood
  • P values
  • Significance tests

ASJC Scopus subject areas

  • Geriatrics and Gerontology
  • Epidemiology

Fingerprint

Dive into the research topics of 'p values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate'. Together they form a unique fingerprint.

Cite this