p-Values

The p-value is the most reported—and most misread—number in epidemiology. It quantifies how surprising the data are under the null hypothesis, and understanding exactly what it does and does not say is essential for honest inference.

Definition

The p-value is the probability, computed assuming the null hypothesis H0H_0 is true, of obtaining a test statistic at least as extreme as the one actually observed:

p=Pr(T at least as extreme as tobsH0).p = \Pr(\,T \text{ at least as extreme as } t_{\text{obs}} \mid H_0\,).

“At least as extreme” is determined by HaH_a: one tail for a directional alternative, both tails for a two-sided one. Small p-values mean the data would be unusual if H0H_0 held, casting doubt on H0H_0.

What a p-value is NOT

Relation to α\alpha

The significance level α\alpha is a fixed threshold chosen before seeing data; we reject H0H_0 when pαp \le \alpha. This keeps the Type I error rate at α\alpha. The p-value itself is a continuous summary of evidence, not a yes/no verdict.

Worked example

Suppose a standardized test statistic is zobs=2.1z_{\text{obs}} = 2.1 under H0H_0, with a standard normal reference and a two-sided alternative. The p-value is the mass in both tails beyond 2.12.1:

p=2Pr(Z2.1)=2×0.01790.0357.p = 2\,\Pr(Z \ge 2.1) = 2 \times 0.0179 \approx 0.0357.

Since 0.0357<0.050.0357 < 0.05, we would reject H0H_0 at α=0.05\alpha=0.05. For a tt statistic we use the tt-distribution instead of the normal.

In code

Compute p-values from a statistic using tail probabilities.

R

z <- 2.1
2 * pnorm(z, lower.tail = FALSE)     # two-sided z p-value
t_stat <- 2.11; df <- 9
2 * pt(abs(t_stat), df, lower.tail = FALSE)   # two-sided t p-value

Python

from scipy import stats
z = 2.1
print(2 * stats.norm.sf(z))               # two-sided z
print(2 * stats.t.sf(abs(2.11), df=9))    # two-sided t
0.035728841125633085
0.06406977491571955

Julia

using Distributions
z = 2.1
println(2 * ccdf(Normal(), z))                 # two-sided z
println(2 * ccdf(TDist(9), abs(2.11)))         # two-sided t

Simulation: under H0H_0 the p-value is Uniform(0,1)

A key fact: if H0H_0 is true (and the test is exact), the p-value is uniformly distributed on [0,1][0,1]. That is why rejecting when pαp \le \alpha gives a Type I error rate of exactly α\alpha.

set.seed(7)
pvals <- replicate(10000, {
  x <- rnorm(30, mean = 0, sd = 1)     # H0: mean = 0 is TRUE
  t.test(x, mu = 0)$p.value
$})
mean(pvals <= 0.05)   # ~0.05
hist(pvals)           # approximately flat

Why it matters for statistics

The p-value is a calibrated measure of evidence against a null model, and its uniform-under-H0H_0 behavior is what makes significance testing control error rates. Interpreting it correctly—as conditional on H0H_0, not a probability of H0H_0—prevents the overclaiming that plagues applied research.