The Central Limit Theorem

The central limit theorem explains why the bell curve is everywhere: add up or average many independent effects and the result is approximately normal, no matter what the individual pieces look like. It is the reason normal-based inference works so broadly.

Histograms of the sample mean from a skewed exponential parent for n = 1, 5, 30, approaching the normal curve as n grows.

Statement

Let X1,,XnX_1, \dots, X_n be iid with mean μ\mu and finite variance σ2\sigma^2. As nn \to \infty, the standardized sample mean converges in distribution to a standard normal: Xˉμσ/nN(0,1).\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \to \mathcal{N}(0, 1). Equivalently, Xˉ\bar{X} is approximately N ⁣(μ,σ2/n)\mathcal{N}\!\big(\mu,\, \sigma^2/n\big) for large nn. The remarkable part: this holds regardless of the shape of the parent distribution — skewed, discrete, bimodal — as long as the variance is finite.

LLN vs. CLT

The law of large numbers says Xˉμ\bar{X} \to \mu (the mean stops moving). The CLT is the finer statement: the leftover fluctuations, magnified by n\sqrt{n}, are Gaussian. LLN gives the location; CLT gives the shape.

Worked example

Suppose service times are exponential with rate λ=1\lambda = 1, so μ=1\mu = 1 and σ=1\sigma = 1 — a strongly right-skewed parent. Average n=50n = 50 of them. The CLT says XˉN ⁣(1, 150),SD(Xˉ)=1500.141.\bar{X} \approx \mathcal{N}\!\left(1,\ \frac{1}{50}\right), \qquad \operatorname{SD}(\bar{X}) = \frac{1}{\sqrt{50}} \approx 0.141. So P(Xˉ>1.2)P ⁣(Z>1.210.141)=P(Z>1.41)0.079P(\bar{X} > 1.2) \approx P\!\left(Z > \frac{1.2 - 1}{0.141}\right) = P(Z > 1.41) \approx 0.079, even though a single exponential draw exceeding 1.21.2 has probability e1.20.30e^{-1.2} \approx 0.30. Averaging tames the skew.

Simulation

Take means of samples from a skewed parent (exponential) and watch the histogram of means become bell-shaped as nn grows.

R

set.seed(11)
for (n in c(1, 5, 30)) {
  means <- replicate(10000, mean(rexp(n, rate = 1)))
  hist(means, breaks = 40, main = paste("n =", n), xlab = "sample mean")
  cat("n =", n, " skewness shrinks; SD =", round(sd(means), 3),
      " theory =", round(1 / sqrt(n), 3), "\n")
}

Python

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(11)

for i, n in enumerate((1, 5, 30)):
    means = np.array([np.random.exponential(1.0, n).mean()
                      for _ in range(10000)])
    plt.subplot(1, 3, i + 1); plt.hist(means, bins=40)
    plt.title(f"n={n}")
    print(f"n={n:>2} SD={means.std(ddof=1):.3f} theory={1/np.sqrt(n):.3f}")
plt.tight_layout()
n= 1 SD=0.986 theory=1.000
n= 5 SD=0.453 theory=0.447
n=30 SD=0.181 theory=0.183

Julia

using Random, Statistics
Random.seed!(11)

for n in (1, 5, 30)
    means = [mean(randexp(n)) for _ in 1:10000]
    println("n=$n SD=", round(std(means), digits=3),
$            " theory=", round(1 / sqrt(n), digits=3))
end

Why it matters for statistics

The CLT is why zz- and tt-based confidence intervals and tests apply to means from almost any population, not just normal ones. It underwrites the normal approximation for proportions and sums, and it tells us how large a sample is “large enough” for inference to be trustworthy. Nearly every classical procedure leans on it.