Covariance Functions and the Matérn Family
A Gaussian process is only as concrete as its covariance function: the kernel is what turns the abstract idea of “a distribution over functions” into something you can compute with. It answers a single question — how strongly do the values at two locations and covary? — and its answer, as a function of the separation between points, silently fixes how wiggly or how smooth the resulting functions are. Choosing a kernel is therefore choosing your assumptions about the world, and the Matérn family gives you a single dial, the smoothness parameter , to set them deliberately.
and the (squared-exponential) limit, all at lengthscale ." />
What makes a covariance function valid
A covariance function assigns to every pair of inputs the covariance between the (random) function values there. For this to be a legitimate covariance, must be positive semi-definite: for any finite set of points the Gram matrix with entries must satisfy This is exactly the condition that the variance of any linear combination is non-negative, which any real covariance must obey. Not every symmetric function of two points qualifies, and this constraint is what rules out many “reasonable-looking” kernels; the families below are valid by construction.
Stationarity and isotropy
A kernel is stationary if it depends only on the separation rather than on the absolute locations, so the statistical behavior of the process looks the same everywhere. It is isotropic if it depends only on the distance , so direction does not matter either — the process has no preferred orientation. Both the squared-exponential and the Matérn kernels are stationary and isotropic, which lets us write them as a one-variable correlation function that decays from its peak at toward zero as points move apart. This is the natural setting for spatial and temporal models, where “nearby points are similar” is the whole modeling premise.
The squared-exponential (RBF) kernel
The most common starting point is the squared-exponential kernel, also called the radial-basis-function (RBF) or Gaussian kernel, where is the lengthscale (the distance over which the function changes appreciably) and is the marginal variance. Its correlation falls off very fast — like in the exponent — so distant points decorrelate quickly while nearby points are almost perfectly correlated. The catch is that this kernel produces functions that are infinitely differentiable: sample paths are unrealistically smooth, analytic curves with no roughness at any scale. For many physical and biological processes that is too strong an assumption, and it is precisely the assumption the Matérn family relaxes.
The Matérn family
The Matérn kernel introduces a smoothness parameter that controls exactly how rough the sample paths are, alongside the lengthscale . Its general isotropic form uses the modified Bessel function of the second kind and the gamma function : As written it looks forbidding, but for the half-integer values the Bessel function collapses to an exponential times a polynomial, giving the clean closed forms that are used in practice. Writing the correlation (with ) for the common choices:
The bottom line is the payoff: as the Matérn kernel converges to the squared-exponential, so the RBF is simply the infinitely-smooth end of the same family.
How ν controls differentiability
The smoothness parameter has a precise meaning: a Matérn process is times mean-square differentiable. So gives continuous-but-nowhere-differentiable paths (very rough), gives once-differentiable paths, gives twice-differentiable paths, and only in the limit do the paths become infinitely differentiable. Small therefore means jagged realizations that can change direction abruptly, while large means gently rolling curves; the parameter interpolates continuously between these regimes. This is the sense in which the kernel is the smoothness assumption — you are choosing the differentiability of the functions you believe generated the data.
The exponential and Ornstein–Uhlenbeck link
The case, , is the exponential kernel, and it is worth singling out. In one dimension the resulting stationary Gaussian process is exactly the Ornstein–Uhlenbeck process, the mean-reverting continuous-time analogue of a first-order autoregressive process. Its paths are continuous but nowhere differentiable — the mathematical fingerprint of a process driven by white noise — which is why the exponential kernel is the right default when you expect genuinely rough, memoryless-in-slope behavior.
Marginal variance, combinations, and noise
The prefactor is the marginal variance: it sets the overall vertical scale of the function, i.e. how far typically strays from its mean, and it factors out of the correlation entirely. Kernels are also closed under addition and multiplication: if and are valid covariance functions then so are (superimposing two behaviors, e.g. a slow trend plus fast wiggles) and (an AND-like interaction), which lets you compose structured kernels from simple parts. Finally, real measurements carry noise, modeled by adding a nugget term to the diagonal: where is only when . The nugget represents observation error (or micro-scale variability), keeps the Gram matrix well-conditioned for inversion, and — because it is a valid kernel added to another — preserves positive semi-definiteness.
A worked example
Consider a spatial model with a Matérn kernel and lengthscale , and ask how correlated two sites apart are. The scaled distance is , so So the two sites share a correlation of about : still meaningfully linked, but well down from the perfect correlation at . For comparison, an RBF kernel with the same would give , and an exponential () kernel would give — the smoother the kernel, the more slowly correlation bleeds away at these separations.
In code
We implement the Matérn correlation directly from the general Bessel-function formula and check it against the closed forms.
R
matern <- function(r, nu, rho = 1) {
s <- sqrt(2 * nu) * r / rho
k <- (2^(1 - nu) / gamma(nu)) * s^nu * besselK(s, nu)
k[r == 0] <- 1 # fill the r = 0 limit
k
}
matern(15, nu = 1.5, rho = 10) # ~0.268, the worked example
matern(1, nu = c(0.5, 1.5, 2.5), rho = 1)
Python
import numpy as np
from scipy.special import kv, gamma
def matern(r, nu, rho=1.0):
"""Matern correlation at separation r (array-safe, with the r=0 limit)."""
r = np.asarray(r, dtype=float)
out = np.ones_like(r) # k(0) = 1
nz = r > 0
s = np.sqrt(2 * nu) * r[nz] / rho
out[nz] = 2 ** (1 - nu) / gamma(nu) * s ** nu * kv(nu, s)
return out
# Worked example: Matern 3/2 at r = 15 km, rho = 10 km
print("Matern 3/2, r=15, rho=10:", round(float(matern(15, 1.5, 10)), 4))
# General Bessel formula agrees with the closed forms at r = 1, rho = 1
r = 1.0
closed = {
0.5: np.exp(-r),
1.5: (1 + np.sqrt(3) * r) * np.exp(-np.sqrt(3) * r),
2.5: (1 + np.sqrt(5) * r + 5 * r ** 2 / 3) * np.exp(-np.sqrt(5) * r),
}
for nu, ck in closed.items():
print(f"nu={nu}: general={float(matern(r, nu)):.6f} closed={ck:.6f}")
Matern 3/2, r=15, rho=10: 0.2678
nu=0.5: general=0.367879 closed=0.367879
nu=1.5: general=0.483358 closed=0.483358
nu=2.5: general=0.523994 closed=0.523994
Julia
using SpecialFunctions # besselk, gamma
function matern(r, nu; rho = 1.0)
r == 0 && return 1.0 # k(0) = 1
s = sqrt(2nu) * r / rho
2.0^(1 - nu) / gamma(nu) * s^nu * besselk(nu, s)
end
matern(15, 1.5; rho = 10) # ~0.268, the worked example
[matern(1.0, nu) for nu in (0.5, 1.5, 2.5)]
Why it matters
In kriging and other spatial models, the covariance function is where your belief about how smoothly a field varies over space actually lives, so picking it well is not a technicality — it changes both the predictions and their uncertainty. The squared-exponential kernel is often too smooth to be credible for real spatial or environmental fields, which is why practitioners reach for the Matérn family and, overwhelmingly, for or : these give once- or twice-differentiable fields that look like real data while remaining cheap to evaluate through their closed forms. The same kernel choice underlies Gaussian process regression and its scalable Hilbert-space approximations, where the Matérn spectral density has an especially clean form. Because the marginal at any single location is normally distributed with variance , the kernel simultaneously sets the pointwise uncertainty and the correlation structure that ties neighboring predictions together.