u-Substitution

u-substitution is the reverse of the chain rule: it undoes a composition of functions inside an integral. It is the workhorse for integrating the kernels of many probability densities, especially the bell-curve kernel ex2/2e^{-x^2/2} used to model biological measurements like birth weights or blood pressure. The same trick handles rate integrals with an exponential inside, such as a drug’s exponential elimination.

The idea

The chain rule says ddxF(g(x))=F(g(x))g(x)\frac{d}{dx}F(g(x)) = F'(g(x))\,g'(x). Reading that backwards gives the substitution rule: if an integrand looks like f(g(x))f(g(x)) multiplied by its inner derivative g(x)g'(x), then

f(g(x))g(x)dx=f(u)du,u=g(x), du=g(x)dx.\int f\big(g(x)\big)\,g'(x)\,dx = \int f(u)\,du, \qquad u = g(x),\ du = g'(x)\,dx .

For a definite integral, change the limits too:

abf(g(x))g(x)dx=g(a)g(b)f(u)du.\int_a^b f\big(g(x)\big)\,g'(x)\,dx = \int_{g(a)}^{g(b)} f(u)\,du .

Steps

  1. Choose u=g(x)u = g(x), an inner function whose derivative also appears (up to a constant).
  2. Compute du=g(x)dxdu = g'(x)\,dx and solve for dxdx or g(x)dxg'(x)\,dx.
  3. Rewrite the whole integral in terms of uu.
  4. Integrate in uu, then substitute xx back (indefinite) or change the limits (definite).

Worked example

Compute 2xex2dx\displaystyle\int 2x\,e^{x^2}\,dx.

Let u=x2u = x^2, so du=2xdxdu = 2x\,dx — and 2xdx2x\,dx is exactly what sits in front of the exponential. The integral becomes

eudu=eu+C=ex2+C.\int e^{u}\,du = e^{u} + C = e^{x^2} + C .

A closely related normal-density kernel: xex2dx\displaystyle\int x\,e^{-x^2}\,dx. Take u=x2u = -x^2, du=2xdxdu = -2x\,dx, so xdx=12dux\,dx = -\tfrac12\,du:

xex2dx=12eudu=12ex2+C.\int x\,e^{-x^2}\,dx = -\frac12\int e^{u}\,du = -\frac12 e^{-x^2} + C .

As a definite check, 0xex2dx=[12ex2]0=0(12)=12.\displaystyle\int_0^\infty x\,e^{-x^2}\,dx = \Big[-\tfrac12 e^{-x^2}\Big]_0^\infty = 0 - \big(-\tfrac12\big) = \tfrac12 .

Computing it

R

# Numeric check: integral of x*exp(-x^2) from 0 to Inf should be 0.5
integrate(function(x) x * exp(-x^2), 0, Inf)$value   # 0.5
$```

### Python

```python
import sympy as sp
from scipy.integrate import quad
import numpy as np

x = sp.symbols("x")
print(sp.integrate(2*x*sp.exp(x**2), x))       # exp(x**2)
print(sp.integrate(x*sp.exp(-x**2), (x, 0, sp.oo)))  # 1/2

val, _ = quad(lambda t: t*np.exp(-t**2), 0, np.inf)
print(val)                                     # 0.4999999999999999
exp(x**2)
1/2
0.5000000000000001

Julia

using Symbolics, QuadGK

@variables x
# Numeric verification of the substitution result
println(quadgk(t -> t*exp(-t^2), 0, Inf)[1])   # 0.5

# Symbolic antiderivative check: d/dx(-1/2 e^{-x^2}) = x e^{-x^2}
D = Differential(x)
expand_derivatives(D(-0.5*exp(-x^2)))          # x*exp(-x^2)

Why it matters for statistics

Normalizing a density and computing moments constantly produces integrands of the form g(x)f(g(x))g'(x)\,f(g(x)). For the normal density the substitution u=(xμ)/σu = (x-\mu)/\sigma reduces any Gaussian integral to the standard eu2/2du\int e^{-u^2/2}\,du; for the exponential and gamma families, u=λxu = -\lambda x handles the exponential factor.