Areal Models: CAR, ICAR, and BYM
Much epidemiological data arrives already aggregated to areal units — counts of cases per county, deaths per district, prevalence per health zone — rather than as points with coordinates. For such data there is no continuous distance to work with, so spatial correlation is encoded instead through a neighborhood graph: which regions share a border. The conditional autoregressive (CAR) family and its relatives build a spatial prior from that graph, letting each region borrow strength from its neighbors so that noisy small-area estimates are smoothed toward a coherent surface.
The neighborhood graph
Label the regions and record adjacency in a symmetric matrix with when regions and share a border and otherwise (and ). This is exactly an adjacency matrix, and the areal model lives on the graph it defines. The row sums are the numbers of neighbors, and collecting them on a diagonal gives the degree matrix . The modeling premise is simple: neighbors are similar. A spatial effect attached to each region should vary smoothly across the map, so adjacent regions ought to have similar values, while regions far apart in the graph are only indirectly linked.
The conditional autoregressive (CAR) model
Rather than write down a joint distribution directly, the CAR model specifies each region’s full conditional — the distribution of given every other region. The standard proper CAR sets so each region is pulled toward a weighted average of its neighbors, scaled by a spatial dependence parameter , with a conditional variance that shrinks as a region gains more neighbors. When the regions are independent; as the pull toward the neighbor mean becomes total.
By Brook’s lemma these conditionals are compatible with a single multivariate normal joint distribution, and its precision (inverse-covariance) matrix is The precision is what makes CAR practical: is sparse, with a nonzero off-diagonal entry only where two regions are neighbors, so even for thousands of regions the model stays cheap to store and factorize. For the matrix is positive definite, so is a valid precision and the joint distribution is proper.
The intrinsic CAR (ICAR)
Taking the limit gives the intrinsic CAR, whose precision is exactly the graph Laplacian scaled by . The Laplacian has the constant vector in its null space — every row of sums to zero — so is singular and the distribution is improper: it specifies the differences between regions but leaves the overall level undetermined. Concretely, the ICAR density is proportional to a penalty on squared differences across every edge of the graph, which is precisely a statement that neighbors should be similar and says nothing about where the whole surface sits. Because the mean level floats, an ICAR term must be fitted under a sum-to-zero constraint , letting a separate intercept carry the overall level. Despite being improper as a prior, the ICAR yields a proper posterior once data anchor the level, and its smoothness penalty makes it the workhorse spatial prior for disease mapping.
The BYM model
A spatial prior alone is too rigid: real region effects mix smooth spatial structure with unstructured, region-specific noise. The Besag–York–Mollié (BYM) model captures both by giving each region a sum of two random effects, where is an ICAR spatial term that borrows strength from neighbors and is an independent heterogeneity term absorbing local overdispersion. Fitted inside a hierarchical model — typically a Poisson GLM for counts — BYM lets the data decide, region by region, how much of an effect is smooth spatial signal and how much is idiosyncratic noise.
The identifiability problem and BYM2
BYM has a nagging flaw: only the sum is identified by the data, so the two variance parameters (spatial) and (iid) trade off against each other and are hard to prior sensibly. Worse, the ICAR precision is not scaled to a standard reference, so the same value of implies wildly different amounts of smoothing on different graphs, making priors non-transferable. The BYM2 parameterization of Riebler et al. (2016) fixes both problems by writing the combined effect with a single marginal precision and a mixing proportion , where is a scaled ICAR term whose generalized variance is standardized to so that has the same meaning on every graph. Now controls the total marginal variance while is an interpretable dial: is pure iid heterogeneity, is pure spatial smoothing, and values between mix them. This clean separation lets you place principled, transferable priors — penalized-complexity priors on both and — which is why BYM2 is the recommended default in INLA.
A worked example
Take four regions arranged in a chain, , so regions and have one neighbor each and regions and have two. The adjacency and degree matrices are The proper CAR precision is then which is tridiagonal — nonzero only on the diagonal and where regions are adjacent — exactly the sparsity that makes CAR scale. Sending gives the ICAR precision , whose rows each sum to zero; it is singular, confirming that the intrinsic model pins down only the differences and needs the constraint to be fitted.
In code
R
Fit a BYM2 disease-mapping model with INLA, the most common route in practice; CARBayes and nimble offer MCMC alternatives with the same .
library(INLA)
# g: an adjacency graph built from a neighbour list (e.g. spdep::poly2nb -> INLA graph)
# df: one row per region with observed count y, expected count E, and region id
formula <- y ~ f(region, model = "bym2", graph = g,
scale.model = TRUE, # scaled ICAR -> comparable priors
hyper = list(
prec = list(prior = "pc.prec", param = c(1, 0.01)),
phi = list(prior = "pc", param = c(0.5, 0.5))))
fit <- inla(formula, family = "poisson", E = E, data = df,
control.predictor = list(compute = TRUE))
summary(fit) # posterior for precision tau and mixing phi
exp(fitmean) # smoothed relative risks per region
Python
Build and for the four-region chain, form the CAR precision , check the ICAR eigenvalues, and take one conditional-expectation smoothing step.
import numpy as np
np.random.seed(0)
# 4-region chain 1-2-3-4
W = np.array([[0, 1, 0, 0],
[1, 0, 1, 0],
[0, 1, 0, 1],
[0, 0, 1, 0]], float)
D = np.diag(W.sum(1))
alpha = 0.95
Q = D - alpha * W # CAR precision, tau^2 = 1
print("degrees:", W.sum(1).astype(int))
print("ICAR eigenvalues:", np.round(np.linalg.eigvalsh(D - W), 3)) # a 0 -> improper
# One CAR smoothing step: posterior mean of x given noisy y,
# with observation precision kappa and prior precision Q / tau^2.
y = np.array([2.0, 0.0, 0.0, -2.0]) # noisy region values
kappa, tau2 = 1.0, 0.3
xhat = np.linalg.solve(kappa * np.eye(4) + Q / tau2, kappa * y)
print("smoothed:", np.round(xhat, 3)) # pulled toward neighbours
degrees: [1 2 2 1]
ICAR eigenvalues: [0. 0.586 2. 3.414]
smoothed: [ 0.587 0.172 -0.172 -0.587]
Julia
using LinearAlgebra, SparseArrays
# Adjacency of the 4-region chain and its degree matrix.
W = sparse([1, 2, 2, 3, 3, 4], [2, 1, 3, 2, 4, 3], 1.0, 4, 4)
D = spdiagm(0 => vec(sum(W, dims = 2)))
α, τ² = 0.95, 0.3
Q = (D - α * W) ./ τ² # sparse CAR precision
L = D - W # ICAR limit: the graph Laplacian
println(round.(eigvals(Matrix(L)), digits = 3)) # a zero -> singular, improper
Why it matters
Areal models are the backbone of disease mapping: given observed and expected counts per region, a BYM or BYM2 model turns jumpy standardized incidence or mortality ratios into smoothed relative-risk surfaces that reveal genuine spatial pattern instead of small-sample noise. The same machinery drives small-area estimation across public health and official statistics, wherever an estimate is needed for every district but each district has too few observations to stand alone. Because the CAR prior is defined by a sparse precision, it slots naturally into fast Bayesian fitting via INLA and into large hierarchical Poisson models. When the underlying process is genuinely continuous and observed at points rather than regions, the analogous tool is kriging with a distance-based covariance instead of a graph-based precision.