Identifiability
A parameter is identifiable when the data can, in principle, pin it down. When two parameters only ever enter the model through their product or ratio, the data constrain that combination but not the parts, and the likelihood develops a flat ridge along which many parameter pairs fit equally well. Recognizing this before you report an estimate saves you from trusting a number the data never determined.
Structural vs practical identifiability
Two failures look similar but have different causes.
- Structural non-identifiability is a property of the model, independent of the data. If the likelihood satisfies for all at distinct , no amount of data can separate them. The classic case is a mean that depends only on a product or a ratio , so the individual factors are invisible.
- Practical non-identifiability is a property of the data at hand. The model is identifiable in principle, but the observations are too few, too noisy, or too weakly informative to constrain a direction, leaving a nearly flat ridge rather than an exactly flat one.
More data can cure the practical case but never the structural one.
The signature: a ridge in the likelihood
Non-identifiability shows up as geometry. Along the offending direction the log-likelihood is flat (structural) or nearly flat (practical), so the surface looks like a valley with a level floor rather than a single peak. Any maximizer on that floor fits as well as any other, which is why an optimizer may return wildly different estimates from different starts while reporting the same fit.
The Bayesian view makes the same point through the posterior. In a non-identified direction the likelihood contributes nothing, so the posterior in that direction simply returns the prior:
If a marginal posterior looks identical to its prior, the data said nothing about that parameter. A tight-looking joint posterior can still hide a long ridge, so inspect the correlations, not just the marginals.
Profile likelihood as a diagnostic
The profile likelihood is the practical tool for finding a ridge. For a parameter with the rest collected in , profile out by maximizing over it at each fixed :
A well-identified gives a profile with a clear peak that falls away on both sides; a non-identified gives a flat profile that never rises above its neighbors. A profile that is flat over a whole range is the fingerprint of a ridge, and the width of a near-flat profile measures how weakly the data constrain the parameter.
A worked example
Suppose case counts have mean , where is a per-contact transmission rate and is population size, modeled as . Only the product enters the likelihood, so the maximum-likelihood surface is flat along every with , the sample mean. Pick or and the fit is identical, because both give . Evaluating the log-likelihood along the ridge returns an essentially constant value, so the data identify the product but neither factor alone. To recover and separately you need extra information: an independent measurement of , a prior, or an experiment that breaks the product.
In code
Evaluate the log-likelihood on a grid and show that it is constant along the ridge .
R
set.seed(1834)
beta_true <- 0.4; N_true <- 50
y <- rpois(30, beta_true * N_true) # mean depends only on beta*N
s <- sum(y); n <- length(y); ybar <- mean(y)
loglik <- function(beta, N) { # Poisson log-lik, dropping constant
mu <- beta * N
s * log(mu) - n * mu
}
betas <- seq(0.1, 1.0, length.out = 60)
ridge_ll <- sapply(betas, function(b) loglik(b, ybar / b))
cat("identified product ybar =", round(ybar, 2), "\n")
cat("log-lik range along ridge =",
formatC(max(ridge_ll) - min(ridge_ll), format = "e", digits = 2), "\n")
Python
import numpy as np
rng = np.random.default_rng(1834)
beta_true, N_true = 0.4, 50.0
y = rng.poisson(beta_true * N_true, size=30) # mean depends only on beta*N
s, n, ybar = y.sum(), len(y), y.mean()
def loglik(beta, N): # Poisson log-lik, dropping constant
mu = beta * N
return s * np.log(mu) - n * mu
betas = np.linspace(0.1, 1.0, 60)
ridge_ll = np.array([loglik(b, ybar / b) for b in betas])
print(f"identified product ybar = {ybar:.2f}")
print(f"log-lik along ridge: min={ridge_ll.min():.4f}, "
f"max={ridge_ll.max():.4f}, range={np.ptp(ridge_ll):.2e}")
identified product ybar = 18.97
log-lik along ridge: min=1105.3867, max=1105.3867, range=2.27e-13
Julia
using Random, Distributions, Statistics
Random.seed!(1834)
beta_true, N_true = 0.4, 50.0
y = rand(Poisson(beta_true * N_true), 30) # mean depends only on beta*N
s, n, ybar = sum(y), length(y), mean(y)
loglik(beta, N) = (mu = beta * N; s * log(mu) - n * mu)
betas = range(0.1, 1.0, length = 60)
ridge_ll = [loglik(b, ybar / b) for b in betas]
println("identified product ybar = ", round(ybar, digits=2))
println("log-lik range along ridge = ",
maximum(ridge_ll) - minimum(ridge_ll))
Why it matters
Transmission models are easy to over-parameterize, and identifiability is what separates a parameter you can report from one the data never touched. A susceptibility and a contact rate that only appear as a product, or a rate and a reporting fraction that trade off, will look estimated when they are merely constrained as a combination. Checking profiles and posteriors keeps you honest about which quantities the data determine, and it points to the design or prior information needed to break a ridge; the same weak directions are what a global sensitivity analysis surfaces as parameters the output barely responds to.