The Legendre Transform

The Legendre transform re-expresses a convex function in terms of its slopes instead of its inputs. It is the mathematical engine behind exponential families, cumulant generating functions, and large-deviations rate functions — places where “the dual variable is a slope.”

Definition

The Legendre–Fenchel transform (convex conjugate) of a function ff is f(p)=supx(pxf(x)).f^*(p) = \sup_{x}\, \big(px - f(x)\big). For each slope pp, we tilt ff by the line pxpx and record the largest vertical gap between that line and the curve.

Geometric intuition

Fix a slope pp. The line y=pxcy = px - c supports ff from below when cc is as large as possible while still pxcf(x)px - c \le f(x) for all xx, i.e. c=supx(pxf(x))=f(p)c = \sup_x(px - f(x)) = f^*(p). So f(p)f^*(p) is minus the intercept of the supporting line of slope pp. The transform stores a convex curve as the family of its tangent lines, indexed by slope.

Key facts

Worked examples

Quadratic. Let f(x)=12x2f(x) = \tfrac12 x^2. Then f(x)=xf'(x) = x, so p=xp = x and f(p)=pp12p2=12p2.f^*(p) = p\cdot p - \tfrac12 p^2 = \tfrac12 p^2 . The Gaussian’s energy is self-dual. More generally, for f(x)=12ax2f(x)=\tfrac12 a x^2 with a>0a>0, stationarity gives x=p/ax = p/a and f(p)=p22af^*(p) = \tfrac{p^2}{2a} — larger curvature aa gives a flatter conjugate.

Exponential. Let f(x)=exf(x) = e^x. Then p=f(x)=exp = f'(x) = e^x, so x=lnpx = \ln p (valid for p>0p>0) and f(p)=plnpelnp=plnpp,p>0,f^*(p) = p\ln p - e^{\ln p} = p\ln p - p, \qquad p > 0, with f(p)=+f^*(p) = +\infty for p<0p<0 and f(0)=0f^*(0)=0. This plnppp\ln p - p is the entropy-like function appearing in Poisson large deviations.

Power law (Young’s inequality). For f(x)=1rxrf(x) = \tfrac{1}{r}|x|^{r} with r>1r>1, the conjugate is f(p)=1spsf^*(p) = \tfrac{1}{s}|p|^{s} where 1r+1s=1\tfrac1r + \tfrac1s = 1. Fenchel–Young then reads xpxrr+pssxp \le \tfrac{|x|^r}{r} + \tfrac{|p|^s}{s}, the classical Young’s inequality underlying Hölder’s inequality.

Role in statistics

In code

Python (symbolic conjugate)

import sympy as sp
x, p = sp.symbols("x p", real=True)

def conjugate(f_expr):
    fp = sp.diff(f_expr, x)          # f'(x)
    x_star = sp.solve(sp.Eq(fp, p), x)[0]   # solve p = f'(x) for x
    return sp.simplify((p * x - f_expr).subs(x, x_star))

print(conjugate(x**2 / 2))           # p**2/2
print(conjugate(sp.exp(x)))          # p*log(p) - p
p**2/2
p*(log(p) - 1)

R (numeric conjugate via optimize)

f <- function(x) exp(x)
fstar <- function(p) {
  # maximize p*x - f(x); optimize minimizes, so negate
  opt <- optimize(function(x) f(x) - p * x, interval = c(-20, 20))
  -opt$objective
$}
p <- 2
c(numeric = fstar(p), exact = p * log(p) - p)   # both ~ -0.6137
# 2*log(2) - 2 = -0.6137, matching the numeric sup

Julia (numeric conjugate via Optim)

using Optim
f(x) = exp(x)
function fstar(p)
    res = optimize(x -> f(x) - p * x, -20.0, 20.0)   # minimizes f(x)-p*x
    -Optim.minimum(res)                               # negate to get the sup
end
p = 2.0
println((fstar(p), p * log(p) - p))   # (~-0.6137, -0.6137)

Why it matters for statistics

The Legendre transform is the bridge between a convex objective and its dual description by slopes. In statistics it converts a cumulant generating function into a rate function (large deviations, concentration inequalities), links the natural and mean parameters of exponential families, and provides the convex duality behind many estimation and optimization problems, including regularized maximum likelihood. Recognizing a supx(pxf(x))\sup_x(px - f(x)) pattern tells you a convex-conjugate structure is at play.