The Normal Distribution
The normal (Gaussian) distribution is the bell-shaped curve that appears whenever many small, independent influences add together: measurement error, biological variation in heights or blood pressure, and — crucially for statistics — the sampling distribution of an average. Its ubiquity is not a coincidence but a consequence of the central limit theorem.
Definition
A random variable has probability density function
- Support: .
- Parameters: mean (location) and variance (spread); is the standard deviation.
- Mean: .
- Variance: .
The curve is symmetric about , with inflection points at .
The standard normal
Setting and gives the standard normal with density Any normal variable can be standardized by the -score , which is why a single table (or the pnorm function) suffices for all normal probabilities.
Its cdf is written .
The 68–95–99.7 rule
For any normal distribution, the probability mass within a few standard deviations of the mean is fixed:
- about of values fall in ,
- about fall in ,
- about fall in .
This “empirical rule” is a fast sanity check: a value more than from the mean is genuinely unusual.
When it arises
The normal distribution arises whenever a quantity is the sum or average of many independent small effects. By the central limit theorem, the sample mean of almost any distribution is approximately normal for large , which is why normal-based tests and confidence intervals are so widely applicable even when the raw data are not themselves normal.
In code
R
# pdf, cdf, quantile, and sampling for N(mu = 100, sd = 15)
dnorm(115, mean = 100, sd = 15) # density at x = 115
pnorm(115, mean = 100, sd = 15) # P(X <= 115) ~ 0.8413
qnorm(0.975, mean = 100, sd = 15) # 97.5% quantile
set.seed(123)
x <- rnorm(10000, mean = 100, sd = 15) # random sample
hist(x, breaks = 40, freq = FALSE) # histogram of the sample
curve(dnorm(x, 100, 15), add = TRUE) # overlay the true density
Python
import numpy as np
from scipy import stats
mu, sigma = 100, 15
stats.norm.pdf(115, mu, sigma) # density at 115
stats.norm.cdf(115, mu, sigma) # P(X <= 115) ~ 0.8413
stats.norm.ppf(0.975, mu, sigma) # 97.5% quantile
rng = np.random.default_rng(123)
x = rng.normal(mu, sigma, size=10000) # random sample
# plt.hist(x, bins=40, density=True); overlay stats.norm.pdf on a grid
Julia
using Distributions, Random
d = Normal(100, 15) # Normal(mean, sd)
pdf(d, 115) # density at 115
cdf(d, 115) # P(X <= 115) ~ 0.8413
quantile(d, 0.975) # 97.5% quantile
Random.seed!(123)
x = rand(d, 10_000) # random sample
# histogram(x, normalize=:pdf); plot!(t -> pdf(d, t)) to overlay the density
Simulation
Simulating draws and comparing the histogram to shows the match, and the sample statistics recover the parameters.
set.seed(7)
x <- rnorm(1e6, mean = 100, sd = 15)
mean(x) # ~ 100 (theoretical mean = mu)
sd(x) # ~ 15 (theoretical sd = sigma)
mean(abs(x - 100) <= 15) # ~ 0.68, matching the 68% rule
Why it matters for statistics
The normal distribution is the backbone of classical inference. Because sample means are approximately normal, - and -based confidence intervals and hypothesis tests rest on it. It is also the reference against which “unusual” observations are judged, and the limiting shape of the t-distribution as degrees of freedom grow.