Jensen’s Inequality and Nonlinear Averaging
The average of a nonlinear function is not the function of the average. This single fact explains why plugging mean parameters into a nonlinear model gives biased predictions — a recurring trap in statistics and epidemiology.
The inequality
Let be a convex function and a random variable with finite mean. Then For a concave function the inequality reverses: . Equality holds if and only if is linear on the support of , or is (almost surely) constant.
The intuition is geometric. Convexity means the chord lies above the curve, and more precisely every point of the curve has a supporting line beneath it. Taking expectations of that supporting line at gives the bound.
Second-order intuition: the variance gap
A Taylor expansion of about (the delta method) makes the size of the gap explicit: Taking expectations, the linear term vanishes (), leaving When is convex, , so the gap is non-negative — Jensen again. The gap grows with the curvature and with the spread .
Examples
- AM–GM inequality. With (convex) and uniform on , Jensen gives , i.e. : the geometric mean never exceeds the arithmetic mean.
- Log of a mean. Since is concave, . Averaging on the log scale and exponentiating underestimates the mean.
- Reciprocals. Since is convex on , : the mean of reciprocals exceeds the reciprocal of the mean. Harmonic mean arithmetic mean.
- Heterogeneous transmission. If a transmission rate varies across subgroups, a quantity like (relevant to reproduction-number and final-size calculations) exceeds by exactly . Using a single average systematically understates the impact of heterogeneity.
Worked example
Let take the values and , each with probability , and take the convex function .
- Mean: , so .
- Function average: .
So , with a gap of .
Check the variance-gap formula: and , so which matches the gap exactly — the approximation is exact here because is quadratic (higher derivatives vanish).
A biological example: performance in a fluctuating environment
Most biological rates — development, metabolism, photosynthesis, even pathogen transmission — depend on temperature through a curved thermal performance curve that peaks at an optimum and falls off on either side. Near that optimum the curve is concave, so Jensen’s inequality bites: an organism experiencing a fluctuating temperature performs worse, on average, than one held at the same mean temperature. Ecologists call plugging the mean temperature into a nonlinear rate the “fallacy of the averages” (Ruel & Ayres, 2001).
Take a performance curve peaking at and a habitat that spends half its time at and half at — mean temperature exactly . Performance at the mean temperature is the maximum, , but the mean performance is only : variability alone costs 63% of performance, with no change in the average temperature. This is why climate variability, not just mean warming, reshapes development rates, vector activity, and transmission — and why a model fed the mean temperature over-predicts. The sign can flip: in the accelerating, convex low-temperature tail of the same curve, added variability would raise mean performance, exactly as predicts.
import numpy as np
Topt, width = 28.0, 8.0
P = lambda T: np.exp(-((T - Topt) / width) ** 2) # concave near the optimum
T = np.array([20.0, 36.0]) # two equally likely temperatures, mean 28
print("P(mean T) =", round(float(P(T.mean())), 3)) # 1.0 performance at the mean
print("E[P(T)] =", round(float(P(T).mean()), 3)) # 0.368 mean performance
print("Jensen gap =", round(float(P(T.mean()) - P(T).mean()), 3)) # 0.632
P(mean T) = 1.0
E[P(T)] = 0.368
Jensen gap = 0.632
Simulation
R
set.seed(42)
g <- function(x) x^2
X <- runif(1e6, 0, 1) # Uniform(0,1): mu = 1/2, Var = 1/12
lhs <- mean(g(X)); rhs <- g(mean(X))
c(E_gX = lhs, g_EX = rhs, gap = lhs - rhs)
# E_gX ~ 0.3333 g_EX ~ 0.25 gap ~ 0.0833 (>= 0, confirms Jensen)
gpp <- 2 # g''(x) = 2
0.5 * gpp * var(X) # ~ 0.0833: variance-gap approximation
Python
import numpy as np
np.random.seed(42)
g = lambda x: x**2
X = np.random.uniform(0, 1, 1_000_000)
lhs, rhs = g(X).mean(), g(X.mean())
print(lhs, rhs, lhs - rhs) # ~0.3333 ~0.25 ~0.0833 (gap >= 0)
print(0.5 * 2 * X.var()) # ~0.0833: (1/2) g''(mu) Var(X)
0.3336193530309505 0.2503345980567165 0.08328475497423404
0.08328475497423413
Julia
using Random, Statistics
Random.seed!(42)
g(x) = x^2
X = rand(1_000_000) # Uniform(0,1)
lhs, rhs = mean(g.(X)), g(mean(X))
println((lhs, rhs, lhs - rhs)) # ~(0.3333, 0.25, 0.0833)
println(0.5 * 2 * var(X)) # ~0.0833
For the exact gap is , matching both the simulation and the second-order formula.
Why it matters for statistics
Jensen’s inequality is the reason “average the inputs, then apply the model” disagrees with “apply the model, then average” whenever the model is nonlinear. It underlies the bias of plug-in estimators, the direction of the delta-method correction, the fact that bounds motivate the EM algorithm and variational inference, and warnings against using mean parameters in nonlinear epidemic models. Knowing the sign (from convexity) and size (from the variance gap) of the discrepancy lets you correct for it.