The Binomial Distribution
The binomial distribution counts how many “successes” occur in a fixed number of independent yes/no trials — how many of vaccinated people avoid infection, how many of tossed coins land heads, how many of patients respond to a treatment. It is the natural model behind testing a proportion.
Definition
Let count the successes in independent trials, each with success probability . Its probability mass function is
- Support: .
- Parameters: number of trials and success probability .
- Mean: .
- Variance: .
The binomial coefficient counts the number of ways to arrange successes among trials.
Sum of Bernoulli trials
A single trial with outcome (success, probability ) or (failure) is a Bernoulli random variable with mean and variance . A binomial variable is just the sum of independent Bernoulli trials: Because expectation and (for independent terms) variance add, this immediately gives and .
When it arises
The binomial arises whenever you count successes among a fixed number of independent, identical trials: proportion testing (fraction cured, fraction defective), survey responses (yes/no), and quality control. The sample proportion is the basis for inference about an unknown .
Extension: the multinomial
When each trial has more than two possible outcomes (say categories with probabilities summing to ), the counts follow the multinomial distribution the direct generalization of the binomial to several categories.
In code
R
# pmf, cdf, quantile, and sampling for Binomial(n = 20, p = 0.3)
dbinom(6, size = 20, prob = 0.3) # P(X = 6)
pbinom(6, size = 20, prob = 0.3) # P(X <= 6)
qbinom(0.95, size = 20, prob = 0.3) # 95% quantile
set.seed(123)
x <- rbinom(10000, size = 20, prob = 0.3) # random sample
hist(x, breaks = seq(-0.5, 20.5, 1), freq = FALSE) # histogram
points(0:20, dbinom(0:20, 20, 0.3)) # overlay the pmf
Python
import numpy as np
from scipy import stats
n, p = 20, 0.3
stats.binom.pmf(6, n, p) # P(X = 6)
stats.binom.cdf(6, n, p) # P(X <= 6)
stats.binom.ppf(0.95, n, p) # 95% quantile
rng = np.random.default_rng(123)
x = rng.binomial(n, p, size=10000) # random sample
# plt.hist(x, bins=range(0, 22), density=True); overlay stats.binom.pmf(range(21), n, p)
Julia
using Distributions, Random
d = Binomial(20, 0.3) # Binomial(n, p)
pdf(d, 6) # P(X = 6) (pdf = pmf for discrete)
cdf(d, 6) # P(X <= 6)
quantile(d, 0.95) # 95% quantile
Random.seed!(123)
x = rand(d, 10_000) # random sample
# histogram(x, normalize=:pdf); scatter!(0:20, pdf.(d, 0:20)) to overlay the pmf
Simulation
The empirical mean of many binomial draws converges to and the variance to .
set.seed(7)
x <- rbinom(1e6, size = 20, prob = 0.3)
mean(x) # ~ 6.0 (theoretical mean = np = 20 * 0.3)
var(x) # ~ 4.2 (theoretical variance = np(1-p) = 20 * 0.3 * 0.7)
Why it matters for statistics
The binomial underpins inference for proportions: confidence intervals for , tests comparing two proportions, and the reasoning behind p-values in exact tests. For large it is well approximated by the normal distribution, and for large with small it approaches the Poisson distribution.