Markov Chain Monte Carlo

Many posteriors have no closed form, so we cannot write down their mean or quantiles directly. Markov chain Monte Carlo (MCMC) sidesteps this by drawing samples from the target distribution and summarizing them.

Metropolis–Hastings: the chain’s trace (left) and the sampled histogram matching the target density (right).

The idea

The goal of Bayesian inference is to learn a posterior p(θy)p(\theta\mid y), but for most models the normalizing constant is an intractable integral. MCMC builds a Markov chain — a sequence θ(1),θ(2),\theta^{(1)},\theta^{(2)},\dots where each state depends only on the previous one — whose stationary distribution is exactly the target p(θy)p(\theta\mid y). Run the chain long enough and its states behave like (correlated) draws from the posterior, so sample averages estimate posterior expectations. Crucially, the chain only needs the target up to a constant, so the unknown normalizer never has to be computed.

Metropolis–Hastings

The Metropolis–Hastings algorithm turns any proposal mechanism into a valid sampler. From the current state θ\theta, propose θ\theta' from a proposal density q(θθ)q(\theta'\mid\theta) and accept it with probability

α=min ⁣(1, p(θ)q(θθ)p(θ)q(θθ)),\alpha=\min\!\left(1,\ \frac{p(\theta')\,q(\theta\mid\theta')}{p(\theta)\,q(\theta'\mid\theta)}\right),

where p()p(\cdot) is the (unnormalized) target. If the proposal is rejected, the chain stays at θ\theta. With a symmetric proposal such as a Normal random walk, q(θθ)=q(θθ)q(\theta\mid\theta')=q(\theta'\mid\theta) and the ratio reduces to min(1,p(θ)/p(θ))\min(1,\,p(\theta')/p(\theta)): always move to higher density, and sometimes move downhill.

Gibbs sampling is a special case for multi-parameter models: cycle through the parameters, drawing each from its full conditional distribution given the current values of the others, accepting every draw.

Diagnostics

Because samples are correlated and the chain starts away from the target, we check convergence and mixing before trusting the output.

Modern samplers reduce random-walk inefficiency by using gradient information: Hamiltonian Monte Carlo and its adaptive variant NUTS power Stan, PyMC, and Turing.jl.

Worked example: sampling a Poisson rate

Suppose we observe counts y=(2,3,1,4,2)y=(2,3,1,4,2) and model them as Poisson with rate λ\lambda, using a Gamma(2,1)\text{Gamma}(2,1) prior. This case is actually conjugate — the posterior is Gamma(2+yi, 1+n)=Gamma(14,6)\text{Gamma}\big(2+\sum y_i,\ 1+n\big)=\text{Gamma}(14,\,6) with mean 14/62.3314/6\approx 2.33 — which lets us check the sampler against the truth. We run a random-walk Metropolis sampler on logλ\log\lambda (so proposals stay positive) targeting the unnormalized posterior p(λ)λyienλλ1eλp(\lambda)\propto \lambda^{\sum y_i}e^{-n\lambda}\cdot\lambda^{1}e^{-\lambda}. After discarding burn-in, the sample mean should land near 2.332.33 and a 95% interval near the analytic Gamma(14,6)\text{Gamma}(14,6) quantiles [1.28,3.72][1.28,\,3.72].

In code

A Metropolis sampler from scratch, with the analytic answer for comparison.

R

set.seed(1)
y <- c(2, 3, 1, 4, 2); n <- length(y)
# Unnormalized log-posterior for lambda > 0 (Gamma(2,1) prior, Poisson likelihood)
log_post <- function(lam) {
  if (lam <= 0) return(-Inf)
  sum(dpois(y, lam, log = TRUE)) + dgamma(lam, 2, 1, log = TRUE)
}

N <- 20000; lam <- 1; draws <- numeric(N)
for (i in 1:N) {
  prop <- lam * exp(rnorm(1, 0, 0.3))          # random walk on log scale
  logA <- log_post(prop) - log_post(lam) + log(prop) - log(lam)  # + Jacobian
  if (log(runif(1)) < logA) lam <- prop
  draws[i] <- lam
}
draws <- draws[-(1:2000)]                       # burn-in
c(mean = mean(draws), quantile(draws, c(0.025, 0.975)))
# mean ~2.33, ~[1.28, 3.72]; analytic: Gamma(14, 6)
qgamma(c(0.025, 0.975), 14, 6)                  # 1.28 3.72
# plot(draws, type = "l"); hist(draws)          # trace + histogram

Python

import numpy as np
from scipy.stats import poisson, gamma

np.random.seed(1)
y = np.array([2, 3, 1, 4, 2]); n = len(y)

def log_post(lam):
    if lam <= 0:
        return -np.inf
    return poisson.logpmf(y, lam).sum() + gamma.logpdf(lam, 2, scale=1)

N, lam = 20000, 1.0
draws = np.empty(N)
for i in range(N):
    prop = lam * np.exp(np.random.normal(0, 0.3))       # random walk on log scale
    logA = log_post(prop) - log_post(lam) + np.log(prop) - np.log(lam)
    if np.log(np.random.rand()) < logA:
        lam = prop
    draws[i] = lam
draws = draws[2000:]                                    # burn-in
print(draws.mean(), np.quantile(draws, [0.025, 0.975]))
# ~2.33  [~1.28, ~3.72]
print(gamma.ppf([0.025, 0.975], 14, scale=1/6))         # analytic: 1.28 3.72
2.338860902947516 [1.27653574 3.74541504]
[1.27565505 3.70506599]

Julia

using Distributions, Random, Statistics
Random.seed!(1)
y = [2, 3, 1, 4, 2]; n = length(y)
log_post(lam) = lam <= 0 ? -Inf :
    sum(logpdf.(Poisson(lam), y)) + logpdf(Gamma(2, 1), lam)

N, lam = 20000, 1.0
draws = zeros(N)
for i in 1:N
    prop = lam * exp(0.3 * randn())                     # random walk on log scale
    logA = log_post(prop) - log_post(lam) + log(prop) - log(lam)
    if log(rand()) < logA
        lam = prop
    end
    draws[i] = lam
end
draws = draws[2001:end]                                 # burn-in
println(mean(draws), " ", quantile(draws, [0.025, 0.975]))
# ~2.33  [~1.28, ~3.72]
println(quantile(Gamma(14, 1/6), [0.025, 0.975]))       # 1.28 3.72

In practice you would reach for a probabilistic-programming system — Stan, PyMC, or Turing.jl — which implements NUTS, adapts the proposal, and reports R^\hat R and effective sample size automatically. See the simulation toolkit for sampler tooling and reproducibility for seeding and archiving runs.

Why it matters

MCMC made Bayesian inference practical for realistic models by replacing intractable integrals with samples. It is the computational backbone behind fitting hierarchical models, transmission models, and phylogenies, wherever the posterior cannot be written in closed form. Understanding its diagnostics is what separates a trustworthy fit from a chain that merely looks busy.