Areal Models: CAR, ICAR, and BYM

Much epidemiological data arrives already aggregated to areal units — counts of cases per county, deaths per district, prevalence per health zone — rather than as points with coordinates. For such data there is no continuous distance to work with, so spatial correlation is encoded instead through a neighborhood graph: which regions share a border. The conditional autoregressive (CAR) family and its relatives build a spatial prior from that graph, letting each region borrow strength from its neighbors so that noisy small-area estimates are smoothed toward a coherent surface.

Two copies of a small region graph: raw noisy region values on the left, and the same regions after CAR smoothing on the right, where neighboring regions are pulled toward one another.

The neighborhood graph

Label the regions 1,,n1,\dots,n and record adjacency in a symmetric n×nn\times n matrix WW with wij=1w_{ij}=1 when regions ii and jj share a border and wij=0w_{ij}=0 otherwise (and wii=0w_{ii}=0). This is exactly an adjacency matrix, and the areal model lives on the graph it defines. The row sums di=jwijd_i=\sum_j w_{ij} are the numbers of neighbors, and collecting them on a diagonal gives the degree matrix D=diag(d1,,dn)D=\operatorname{diag}(d_1,\dots,d_n). The modeling premise is simple: neighbors are similar. A spatial effect xix_i attached to each region should vary smoothly across the map, so adjacent regions ought to have similar values, while regions far apart in the graph are only indirectly linked.

The conditional autoregressive (CAR) model

Rather than write down a joint distribution directly, the CAR model specifies each region’s full conditional — the distribution of xix_i given every other region. The standard proper CAR sets xixi    Normal ⁣(αjwijxjjwij, τ2jwij),x_i \mid x_{-i} \;\sim\; \mathrm{Normal}\!\left(\frac{\alpha \sum_j w_{ij}\,x_j}{\sum_j w_{ij}},\ \frac{\tau^2}{\sum_j w_{ij}}\right), so each region is pulled toward a weighted average of its neighbors, scaled by a spatial dependence parameter α[0,1)\alpha\in[0,1), with a conditional variance that shrinks as a region gains more neighbors. When α=0\alpha=0 the regions are independent; as α1\alpha\to 1 the pull toward the neighbor mean becomes total.

By Brook’s lemma these conditionals are compatible with a single multivariate normal joint distribution, and its precision (inverse-covariance) matrix is Q  =  τ2(DαW).Q \;=\; \tau^{-2}\,(D - \alpha W). The precision is what makes CAR practical: QQ is sparse, with a nonzero off-diagonal entry only where two regions are neighbors, so even for thousands of regions the model stays cheap to store and factorize. For α<1\alpha<1 the matrix DαWD-\alpha W is positive definite, so QQ is a valid precision and the joint distribution is proper.

The intrinsic CAR (ICAR)

Taking the limit α1\alpha\to 1 gives the intrinsic CAR, whose precision Q=τ2(DW)Q=\tau^{-2}(D-W) is exactly the graph Laplacian scaled by τ2\tau^{-2}. The Laplacian has the constant vector in its null space — every row of DWD-W sums to zero — so QQ is singular and the distribution is improper: it specifies the differences between regions but leaves the overall level undetermined. Concretely, the ICAR density is proportional to exp ⁣(12τ2ij(xixj)2),\exp\!\left(-\frac{1}{2\tau^2}\sum_{i\sim j}(x_i - x_j)^2\right), a penalty on squared differences across every edge iji\sim j of the graph, which is precisely a statement that neighbors should be similar and says nothing about where the whole surface sits. Because the mean level floats, an ICAR term must be fitted under a sum-to-zero constraint ixi=0\sum_i x_i = 0, letting a separate intercept carry the overall level. Despite being improper as a prior, the ICAR yields a proper posterior once data anchor the level, and its smoothness penalty makes it the workhorse spatial prior for disease mapping.

The BYM model

A spatial prior alone is too rigid: real region effects mix smooth spatial structure with unstructured, region-specific noise. The Besag–York–Mollié (BYM) model captures both by giving each region a sum of two random effects, ηi  =  ui+vi,\eta_i \;=\; u_i + v_i, where uu is an ICAR spatial term that borrows strength from neighbors and viNormal(0,σv2)v_i \sim \mathrm{Normal}(0,\sigma_v^2) is an independent heterogeneity term absorbing local overdispersion. Fitted inside a hierarchical model — typically a Poisson GLM for counts — BYM lets the data decide, region by region, how much of an effect is smooth spatial signal and how much is idiosyncratic noise.

The identifiability problem and BYM2

BYM has a nagging flaw: only the sum ui+viu_i+v_i is identified by the data, so the two variance parameters τ2\tau^2 (spatial) and σv2\sigma_v^2 (iid) trade off against each other and are hard to prior sensibly. Worse, the ICAR precision is not scaled to a standard reference, so the same value of τ2\tau^2 implies wildly different amounts of smoothing on different graphs, making priors non-transferable. The BYM2 parameterization of Riebler et al. (2016) fixes both problems by writing the combined effect with a single marginal precision τ\tau and a mixing proportion ϕ[0,1]\phi\in[0,1], η  =  1τ(ϕu+1ϕv),\eta \;=\; \frac{1}{\sqrt{\tau}}\left(\sqrt{\phi}\,u_\star + \sqrt{1-\phi}\,v\right), where uu_\star is a scaled ICAR term whose generalized variance is standardized to 11 so that τ\tau has the same meaning on every graph. Now τ\tau controls the total marginal variance while ϕ\phi is an interpretable dial: ϕ=0\phi=0 is pure iid heterogeneity, ϕ=1\phi=1 is pure spatial smoothing, and values between mix them. This clean separation lets you place principled, transferable priors — penalized-complexity priors on both τ\tau and ϕ\phi — which is why BYM2 is the recommended default in INLA.

A worked example

Take four regions arranged in a chain, 12341 - 2 - 3 - 4, so regions 11 and 44 have one neighbor each and regions 22 and 33 have two. The adjacency and degree matrices are W=(0100101001010010),D=(1000020000200001).W = \begin{pmatrix} 0&1&0&0\\ 1&0&1&0\\ 0&1&0&1\\ 0&0&1&0 \end{pmatrix}, \qquad D = \begin{pmatrix} 1&0&0&0\\ 0&2&0&0\\ 0&0&2&0\\ 0&0&0&1 \end{pmatrix}. The proper CAR precision is then Q=τ2(DαW)=τ2(1α00α2α00α2α00α1),Q = \tau^{-2}(D-\alpha W) = \tau^{-2}\begin{pmatrix} 1 & -\alpha & 0 & 0\\ -\alpha & 2 & -\alpha & 0\\ 0 & -\alpha & 2 & -\alpha\\ 0 & 0 & -\alpha & 1 \end{pmatrix}, which is tridiagonal — nonzero only on the diagonal and where regions are adjacent — exactly the sparsity that makes CAR scale. Sending α1\alpha\to1 gives the ICAR precision τ2(DW)\tau^{-2}(D-W), whose rows each sum to zero; it is singular, confirming that the intrinsic model pins down only the differences xixjx_i-x_j and needs the constraint ixi=0\sum_i x_i=0 to be fitted.

In code

R

Fit a BYM2 disease-mapping model with INLA, the most common route in practice; CARBayes and nimble offer MCMC alternatives with the same WW.

library(INLA)
# g: an adjacency graph built from a neighbour list (e.g. spdep::poly2nb -> INLA graph)
# df: one row per region with observed count y, expected count E, and region id
formula <- y ~ f(region, model = "bym2", graph = g,
                 scale.model = TRUE,             # scaled ICAR -> comparable priors
                 hyper = list(
                   prec = list(prior = "pc.prec", param = c(1, 0.01)),
                   phi  = list(prior = "pc",      param = c(0.5, 0.5))))

fit <- inla(formula, family = "poisson", E = E, data = df,
            control.predictor = list(compute = TRUE))
summary(fit)                    # posterior for precision tau and mixing phi
exp(fitsummary.fitted.valuessummary.fitted.valuesmean)   # smoothed relative risks per region

Python

Build WW and DD for the four-region chain, form the CAR precision Q=DαWQ=D-\alpha W, check the ICAR eigenvalues, and take one conditional-expectation smoothing step.

import numpy as np
np.random.seed(0)

# 4-region chain 1-2-3-4
W = np.array([[0, 1, 0, 0],
              [1, 0, 1, 0],
              [0, 1, 0, 1],
              [0, 0, 1, 0]], float)
D = np.diag(W.sum(1))
alpha = 0.95
Q = D - alpha * W                       # CAR precision, tau^2 = 1

print("degrees:", W.sum(1).astype(int))
print("ICAR eigenvalues:", np.round(np.linalg.eigvalsh(D - W), 3))  # a 0 -> improper

# One CAR smoothing step: posterior mean of x given noisy y,
# with observation precision kappa and prior precision Q / tau^2.
y = np.array([2.0, 0.0, 0.0, -2.0])     # noisy region values
kappa, tau2 = 1.0, 0.3
xhat = np.linalg.solve(kappa * np.eye(4) + Q / tau2, kappa * y)
print("smoothed:", np.round(xhat, 3))   # pulled toward neighbours
degrees: [1 2 2 1]
ICAR eigenvalues: [0.    0.586 2.    3.414]
smoothed: [ 0.587  0.172 -0.172 -0.587]

Julia

using LinearAlgebra, SparseArrays
# Adjacency of the 4-region chain and its degree matrix.
W = sparse([1, 2, 2, 3, 3, 4], [2, 1, 3, 2, 4, 3], 1.0, 4, 4)
D = spdiagm(0 => vec(sum(W, dims = 2)))

α, τ² = 0.95, 0.3
Q = (D - α * W) ./ τ²                    # sparse CAR precision
L = D - W                               # ICAR limit: the graph Laplacian
println(round.(eigvals(Matrix(L)), digits = 3))   # a zero -> singular, improper

Why it matters

Areal models are the backbone of disease mapping: given observed and expected counts per region, a BYM or BYM2 model turns jumpy standardized incidence or mortality ratios into smoothed relative-risk surfaces that reveal genuine spatial pattern instead of small-sample noise. The same machinery drives small-area estimation across public health and official statistics, wherever an estimate is needed for every district but each district has too few observations to stand alone. Because the CAR prior is defined by a sparse precision, it slots naturally into fast Bayesian fitting via INLA and into large hierarchical Poisson models. When the underlying process is genuinely continuous and observed at points rather than regions, the analogous tool is kriging with a distance-based covariance instead of a graph-based precision.