p-Values
The p-value is the most reported—and most misread—number in epidemiology. It quantifies how surprising the data are under the null hypothesis, and understanding exactly what it does and does not say is essential for honest inference.
Definition
The p-value is the probability, computed assuming the null hypothesis is true, of obtaining a test statistic at least as extreme as the one actually observed:
“At least as extreme” is determined by : one tail for a directional alternative, both tails for a two-sided one. Small p-values mean the data would be unusual if held, casting doubt on .
What a p-value is NOT
- It is not . It conditions on ; it does not measure the probability of .
- It is not the probability the result occurred “by chance.”
- It is not the size or importance of an effect—a tiny, irrelevant effect can yield a small p-value with a large sample.
- is not the probability is true.
Relation to
The significance level is a fixed threshold chosen before seeing data; we reject when . This keeps the Type I error rate at . The p-value itself is a continuous summary of evidence, not a yes/no verdict.
Worked example
Suppose a standardized test statistic is under , with a standard normal reference and a two-sided alternative. The p-value is the mass in both tails beyond :
Since , we would reject at . For a statistic we use the -distribution instead of the normal.
In code
Compute p-values from a statistic using tail probabilities.
R
z <- 2.1
2 * pnorm(z, lower.tail = FALSE) # two-sided z p-value
t_stat <- 2.11; df <- 9
2 * pt(abs(t_stat), df, lower.tail = FALSE) # two-sided t p-value
Python
from scipy import stats
z = 2.1
print(2 * stats.norm.sf(z)) # two-sided z
print(2 * stats.t.sf(abs(2.11), df=9)) # two-sided t
0.035728841125633085
0.06406977491571955
Julia
using Distributions
z = 2.1
println(2 * ccdf(Normal(), z)) # two-sided z
println(2 * ccdf(TDist(9), abs(2.11))) # two-sided t
Simulation: under the p-value is Uniform(0,1)
A key fact: if is true (and the test is exact), the p-value is uniformly distributed on . That is why rejecting when gives a Type I error rate of exactly .
set.seed(7)
pvals <- replicate(10000, {
x <- rnorm(30, mean = 0, sd = 1) # H0: mean = 0 is TRUE
t.test(x, mu = 0)$p.value
$})
mean(pvals <= 0.05) # ~0.05
hist(pvals) # approximately flat
Why it matters for statistics
The p-value is a calibrated measure of evidence against a null model, and its uniform-under- behavior is what makes significance testing control error rates. Interpreting it correctly—as conditional on , not a probability of —prevents the overclaiming that plagues applied research.