Identifiability

A parameter is identifiable when the data can, in principle, pin it down. When two parameters only ever enter the model through their product or ratio, the data constrain that combination but not the parts, and the likelihood develops a flat ridge along which many parameter pairs fit equally well. Recognizing this before you report an estimate saves you from trusting a number the data never determined.

A Poisson model whose mean depends only on the product betaN leaves the two parameters unidentified: the log-likelihood is flat along the hyperbola betaN equals the sample mean.

Structural vs practical identifiability

Two failures look similar but have different causes.

More data can cure the practical case but never the structural one.

The signature: a ridge in the likelihood

Non-identifiability shows up as geometry. Along the offending direction the log-likelihood is flat (structural) or nearly flat (practical), so the surface looks like a valley with a level floor rather than a single peak. Any maximizer on that floor fits as well as any other, which is why an optimizer may return wildly different estimates from different starts while reporting the same fit.

The Bayesian view makes the same point through the posterior. In a non-identified direction the likelihood contributes nothing, so the posterior in that direction simply returns the prior:

p(θy)  p(yθ)p(θ) flat likelihood p(θ).p(\theta\mid y)\ \propto\ p(y\mid\theta)\,p(\theta)\ \xrightarrow[\text{flat likelihood}]{}\ p(\theta).

If a marginal posterior looks identical to its prior, the data said nothing about that parameter. A tight-looking joint posterior can still hide a long ridge, so inspect the correlations, not just the marginals.

Profile likelihood as a diagnostic

The profile likelihood is the practical tool for finding a ridge. For a parameter θ1\theta_1 with the rest collected in θ2\theta_2, profile out θ2\theta_2 by maximizing over it at each fixed θ1\theta_1:

p(θ1)=maxθ2 (θ1,θ2).\ell_p(\theta_1)=\max_{\theta_2}\ \ell(\theta_1,\theta_2).

A well-identified θ1\theta_1 gives a profile with a clear peak that falls away on both sides; a non-identified θ1\theta_1 gives a flat profile that never rises above its neighbors. A profile that is flat over a whole range is the fingerprint of a ridge, and the width of a near-flat profile measures how weakly the data constrain the parameter.

A worked example

Suppose case counts have mean βN\beta N, where β\beta is a per-contact transmission rate and NN is population size, modeled as yiPoisson(βN)y_i\sim\text{Poisson}(\beta N). Only the product enters the likelihood, so the maximum-likelihood surface is flat along every (β,N)(\beta, N) with βN=yˉ\beta N=\bar y, the sample mean. Pick β=0.4, N=50\beta=0.4,\ N=50 or β=0.1, N=200\beta=0.1,\ N=200 and the fit is identical, because both give βN=20\beta N=20. Evaluating the log-likelihood along the ridge βN=yˉ\beta N=\bar y returns an essentially constant value, so the data identify the product βN\beta N but neither factor alone. To recover β\beta and NN separately you need extra information: an independent measurement of NN, a prior, or an experiment that breaks the product.

In code

Evaluate the log-likelihood on a (β,N)(\beta, N) grid and show that it is constant along the ridge βN=yˉ\beta N=\bar y.

R

set.seed(1834)
beta_true <- 0.4; N_true <- 50
y <- rpois(30, beta_true * N_true)     # mean depends only on beta*N
s <- sum(y); n <- length(y); ybar <- mean(y)

loglik <- function(beta, N) {          # Poisson log-lik, dropping constant
  mu <- beta * N
  s * log(mu) - n * mu
}

betas <- seq(0.1, 1.0, length.out = 60)
ridge_ll <- sapply(betas, function(b) loglik(b, ybar / b))
cat("identified product ybar =", round(ybar, 2), "\n")
cat("log-lik range along ridge =",
    formatC(max(ridge_ll) - min(ridge_ll), format = "e", digits = 2), "\n")

Python

import numpy as np

rng = np.random.default_rng(1834)
beta_true, N_true = 0.4, 50.0
y = rng.poisson(beta_true * N_true, size=30)   # mean depends only on beta*N
s, n, ybar = y.sum(), len(y), y.mean()


def loglik(beta, N):                # Poisson log-lik, dropping constant
    mu = beta * N
    return s * np.log(mu) - n * mu


betas = np.linspace(0.1, 1.0, 60)
ridge_ll = np.array([loglik(b, ybar / b) for b in betas])
print(f"identified product ybar = {ybar:.2f}")
print(f"log-lik along ridge: min={ridge_ll.min():.4f}, "
      f"max={ridge_ll.max():.4f}, range={np.ptp(ridge_ll):.2e}")
identified product ybar = 18.97
log-lik along ridge: min=1105.3867, max=1105.3867, range=2.27e-13

Julia

using Random, Distributions, Statistics
Random.seed!(1834)
beta_true, N_true = 0.4, 50.0
y = rand(Poisson(beta_true * N_true), 30)      # mean depends only on beta*N
s, n, ybar = sum(y), length(y), mean(y)

loglik(beta, N) = (mu = beta * N; s * log(mu) - n * mu)

betas = range(0.1, 1.0, length = 60)
ridge_ll = [loglik(b, ybar / b) for b in betas]
println("identified product ybar = ", round(ybar, digits=2))
println("log-lik range along ridge = ",
        maximum(ridge_ll) - minimum(ridge_ll))

Why it matters

Transmission models are easy to over-parameterize, and identifiability is what separates a parameter you can report from one the data never touched. A susceptibility and a contact rate that only appear as a product, or a rate and a reporting fraction that trade off, will look estimated when they are merely constrained as a combination. Checking profiles and posteriors keeps you honest about which quantities the data determine, and it points to the design or prior information needed to break a ridge; the same weak directions are what a global sensitivity analysis surfaces as parameters the output barely responds to.