The Legendre Transform
The Legendre transform re-expresses a convex function in terms of its slopes instead of its inputs. It is the mathematical engine behind exponential families, cumulant generating functions, and large-deviations rate functions — places where “the dual variable is a slope.”
Definition
The Legendre–Fenchel transform (convex conjugate) of a function is For each slope , we tilt by the line and record the largest vertical gap between that line and the curve.
Geometric intuition
Fix a slope . The line supports from below when is as large as possible while still for all , i.e. . So is minus the intercept of the supporting line of slope . The transform stores a convex curve as the family of its tangent lines, indexed by slope.
Key facts
- is always convex, being a supremum of the affine functions , one for each .
- Stationarity. If is differentiable, the sup is attained where , i.e. at the with The conjugate variable is the slope of .
- Involution. For a (closed) convex , applying the transform twice returns the original: . For non-convex , is its convex hull (the largest convex function below it).
- Fenchel–Young inequality. Directly from the definition, for all , with equality iff . This is the general form of Young’s inequality.
Worked examples
Quadratic. Let . Then , so and The Gaussian’s energy is self-dual. More generally, for with , stationarity gives and — larger curvature gives a flatter conjugate.
Exponential. Let . Then , so (valid for ) and with for and . This is the entropy-like function appearing in Poisson large deviations.
Power law (Young’s inequality). For with , the conjugate is where . Fenchel–Young then reads , the classical Young’s inequality underlying Hölder’s inequality.
Role in statistics
- Exponential families & CGFs. The cumulant generating function is convex, and its Legendre transform is the large-deviations rate function (Cramér’s theorem): .
- Duality with Jensen. Because is convex, , which is exactly Jensen’s inequality applied to the convex . The Fenchel–Young gap is the same convexity fact seen through the conjugate.
In code
Python (symbolic conjugate)
import sympy as sp
x, p = sp.symbols("x p", real=True)
def conjugate(f_expr):
fp = sp.diff(f_expr, x) # f'(x)
x_star = sp.solve(sp.Eq(fp, p), x)[0] # solve p = f'(x) for x
return sp.simplify((p * x - f_expr).subs(x, x_star))
print(conjugate(x**2 / 2)) # p**2/2
print(conjugate(sp.exp(x))) # p*log(p) - p
p**2/2
p*(log(p) - 1)
R (numeric conjugate via optimize)
f <- function(x) exp(x)
fstar <- function(p) {
# maximize p*x - f(x); optimize minimizes, so negate
opt <- optimize(function(x) f(x) - p * x, interval = c(-20, 20))
-opt$objective
$}
p <- 2
c(numeric = fstar(p), exact = p * log(p) - p) # both ~ -0.6137
# 2*log(2) - 2 = -0.6137, matching the numeric sup
Julia (numeric conjugate via Optim)
using Optim
f(x) = exp(x)
function fstar(p)
res = optimize(x -> f(x) - p * x, -20.0, 20.0) # minimizes f(x)-p*x
-Optim.minimum(res) # negate to get the sup
end
p = 2.0
println((fstar(p), p * log(p) - p)) # (~-0.6137, -0.6137)
Why it matters for statistics
The Legendre transform is the bridge between a convex objective and its dual description by slopes. In statistics it converts a cumulant generating function into a rate function (large deviations, concentration inequalities), links the natural and mean parameters of exponential families, and provides the convex duality behind many estimation and optimization problems, including regularized maximum likelihood. Recognizing a pattern tells you a convex-conjugate structure is at play.