Bayesian Inference
Bayesian inference treats an unknown parameter as a random quantity and updates our beliefs about it as data arrive. The engine is Bayes’ theorem, which combines what we knew before with what the data tell us.
Bayes’ theorem for parameters
Let be a parameter and the observed data. Bayesian inference computes the posterior distribution of given :
The pieces each have a name.
- The prior encodes beliefs about before seeing the data.
- The likelihood is the probability of the data as a function of (the same object maximized in maximum likelihood estimation).
- The posterior is the updated belief after combining prior and data.
- The marginal likelihood (or evidence) normalizes the posterior so it integrates to one:
Because does not depend on , we often work with the unnormalized form and restore the constant at the end.
Conjugate priors
For some prior–likelihood pairs the posterior belongs to the same family as the prior, giving a closed-form update. Such priors are called conjugate.
- Beta–Binomial. With a prior on a proportion and successes in Binomial trials, the posterior is .
- Gamma–Poisson. With a prior on a rate and Poisson counts, observing total over observations gives a posterior.
- Normal–Normal. With a Normal prior on a mean and Normal data of known variance, the posterior is again Normal, its mean a precision-weighted average of prior mean and sample mean.
Conjugacy is convenient but not required; when it fails we sample from the posterior with Markov chain Monte Carlo. The distributions overview collects the families used above.
Summarizing the posterior
The posterior is a full distribution, but we usually report a few summaries.
- Posterior mean , a common point estimate.
- MAP (maximum a posteriori), the mode .
- Credible interval, an interval containing a stated posterior probability, e.g. a central 95% interval between the 2.5% and 97.5% posterior quantiles.
Under a flat (constant) prior the posterior is proportional to the likelihood, so the MAP coincides with the maximum likelihood estimate.
Credible vs. confidence intervals
A 95% credible interval admits the direct statement that, given the data and prior, lies in the interval with probability . A frequentist confidence interval makes no such probability claim about : instead, 95% of intervals built by the same procedure across repeated samples would cover the fixed true value. The Bayesian statement is about the parameter; the frequentist one is about the procedure.
Worked example: estimating a proportion
Suppose we test a diagnostic assay on samples and observe positives, and we want the positive fraction . Start from a uniform prior , which expresses no preference among proportions. The Beta–Binomial rule gives the posterior
The posterior mean is , slightly shrunk from the raw fraction toward the prior mean . A 95% central credible interval runs between the 2.5% and 97.5% quantiles of , approximately . We conclude the positive fraction is most plausibly near 0.59, with the data leaving substantial uncertainty.
In code
Analytic Beta–Binomial posterior plus a grid approximation that works even without conjugacy.
R
set.seed(1)
a <- 1; b <- 1 # Beta(1,1) prior
k <- 12; n <- 20 # data
# Analytic posterior: Beta(a+k, b+n-k)
ap <- a + k; bp <- b + n - k
post_mean <- ap / (ap + bp)
ci <- qbeta(c(0.025, 0.975), ap, bp)
c(mean = post_mean, lo = ci[1], hi = ci[2])
# mean 0.5909 lo 0.3847 hi 0.7817
# Grid approximation (no conjugacy needed)
grid <- seq(0, 1, length.out = 2001)
post <- dbeta(grid, a, b) * dbinom(k, n, grid)
post <- post / sum(post) # normalize on the grid
cdf <- cumsum(post)
grid[c(which.min(abs(cdf - 0.025)), which.min(abs(cdf - 0.975)))]
# 0.385 0.782 (matches the analytic interval)
Python
import numpy as np
from scipy.stats import beta, binom
np.random.seed(1)
a, b = 1, 1 # prior
k, n = 12, 20 # data
ap, bp = a + k, b + n - k
print(ap / (ap + bp)) # 0.5909 posterior mean
print(beta.ppf([0.025, 0.975], ap, bp)) # [0.3847 0.7817]
# Grid approximation
grid = np.linspace(0, 1, 2001)
post = beta.pdf(grid, a, b) * binom.pmf(k, n, grid)
post /= post.sum()
cdf = np.cumsum(post)
lo = grid[np.argmin(np.abs(cdf - 0.025))]
hi = grid[np.argmin(np.abs(cdf - 0.975))]
print(lo, hi) # 0.385 0.782
0.5909090909090909
[0.38435439 0.78180314]
0.384 0.7815
Julia
using Distributions, Random
Random.seed!(1)
a, b = 1, 1
k, n = 12, 20
post = Beta(a + k, b + n - k)
println(mean(post)) # 0.5909
println(quantile.(post, [0.025, 0.975])) # [0.3847, 0.7817]
# Grid approximation
grid = range(0, 1, length = 2001)
w = pdf.(Beta(a, b), grid) .* pdf.(Binomial(n, collect(grid)), k)
w ./= sum(w)
cdf = cumsum(w)
lo = grid[argmin(abs.(cdf .- 0.025))]
hi = grid[argmin(abs.(cdf .- 0.975))]
println((lo, hi)) # (0.385, 0.782)
For models without conjugate priors, probabilistic-programming tools such as Stan, PyMC, and Turing.jl draw posterior samples automatically.
Why it matters
Bayesian inference gives a coherent way to combine prior knowledge with data and to express conclusions as probabilities about the quantities we actually care about. It handles small samples, hierarchical structure, and prior information gracefully, and its credible intervals answer the question practitioners usually mean to ask. These ideas underpin much of modern epidemiological modeling, from estimating prevalence to fitting transmission dynamics.