The Law of Large Numbers
The law of large numbers is the guarantee that averaging works: pile up enough independent observations and the sample mean settles on the true mean. It is the formal reason a long-run average is a trustworthy estimate. As a study enrolls more people an estimated attack rate or seroprevalence settles onto its true value, and Monte Carlo estimates of an outbreak’s extinction probability stabilize as the number of simulated epidemics grows.
Statement
Let be independent and identically distributed with mean , and let The weak law of large numbers says converges to in probability: for any tolerance , Informally, . The stronger version (the strong law) gives convergence with probability one.
Connection to limits
This is a probabilistic limit. Recall : as the standard error vanishes, the sampling distribution of collapses onto the single point . Note the LLN describes where lands; the central limit theorem describes the shape of its fluctuations along the way.
Worked example
A fair coin scores for heads, for tails, so . After flips the sample proportion of heads is . With a run of heads () is unremarkable, but by the proportion is almost surely within a hundredth of , because the standard error has dropped to .
Simulation
Track the running mean and watch it converge to the true value.
R
set.seed(3)
flips <- rbinom(10000, size = 1, prob = 0.5)
running <- cumsum(flips) / seq_along(flips)
plot(running, type = "l", ylim = c(0, 1),
xlab = "n", ylab = "running mean")
abline(h = 0.5, col = "red")
tail(running, 1) # close to 0.5
Python
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(3)
flips = np.random.binomial(1, 0.5, size=10000)
running = np.cumsum(flips) / np.arange(1, len(flips) + 1)
plt.plot(running)
plt.axhline(0.5, color="red")
plt.xlabel("n"); plt.ylabel("running mean")
print(running[-1]) # close to 0.5
0.4937
Julia
using Random, Statistics
Random.seed!(3)
flips = rand(0:1, 10000)
running = cumsum(flips) ./ (1:length(flips))
println(running[end]) # close to 0.5
Why it matters for statistics
The LLN justifies estimation itself: sample means, proportions, and Monte Carlo integrals are reliable because they converge to the quantities they estimate. It is also the foundation of simulation — every “run it many times and average” argument on this site rests on it, including the empirical means used to approximate an expected value.