Permutation Tests
Permutation tests let epidemiologists test for a group difference without assuming a particular distribution—useful for small samples or odd-shaped data where a -test’s assumptions are shaky. They build the null distribution directly from the data by shuffling.
The idea
Under a null hypothesis of no effect / exchangeability, the group labels carry no information: any assignment of observations to groups is equally likely. So we can:
- Compute a test statistic on the real data (e.g., the difference in group means, ).
- Repeatedly shuffle the labels, recomputing the statistic each time to trace out its distribution under .
- Compute the p-value as the fraction of permutations giving a statistic at least as extreme as the observed one.
where is the number of permutations. Adding 1 to numerator and denominator (counting the observed data itself) keeps the test valid and avoids a p-value of exactly zero.
Worked example
Two groups: treated and control . Observed statistic:
There are ways to split the six pooled values into two groups of three. We recompute for each split and count how many give . That count over 20 (with the continuity adjustment) is the two-sided permutation p-value. With small the smallest attainable p-value is bounded below by .
In code
R
set.seed(1)
a <- c(5.1, 6.3, 5.8); b <- c(4.2, 4.9, 5.0)
obs <- mean(a) - mean(b)
pool <- c(a, b); n <- length(a)
perm <- replicate(10000, {
idx <- sample(length(pool), n) # shuffle labels
mean(pool[idx]) - mean(pool[-idx])
})
(sum(abs(perm) >= abs(obs)) + 1) / (length(perm) + 1) # two-sided p
Python
import numpy as np
rng = np.random.default_rng(1)
a = np.array([5.1, 6.3, 5.8]); b = np.array([4.2, 4.9, 5.0])
obs = a.mean() - b.mean()
pool = np.concatenate([a, b]); n = len(a)
perm = np.empty(10000)
for i in range(perm.size):
p = rng.permutation(pool) # shuffle labels
perm[i] = p[:n].mean() - p[n:].mean()
print((np.sum(np.abs(perm) >= abs(obs)) + 1) / (perm.size + 1))
0.0978902109789021
Julia
using Random, Statistics
Random.seed!(1)
a = [5.1, 6.3, 5.8]; b = [4.2, 4.9, 5.0]
obs = mean(a) - mean(b)
pool = vcat(a, b); n = length(a)
perm = map(1:10_000) do _
p = shuffle(pool) # shuffle labels
mean(p[1:n]) - mean(p[n+1:end])
end
println((count(x -> abs(x) >= abs(obs), perm) + 1) / (length(perm) + 1))
Why it matters for statistics
Permutation tests give exact or nearly exact p-values under minimal assumptions, relying only on exchangeability rather than normality or large samples. They are a robust, transparent alternative to parametric tests and generalize to almost any statistic you can compute.