Chain Rule
The chain rule differentiates a composition of functions — a function of a function. It is arguably the single most important differentiation rule: it powers backpropagation in neural networks and the delta method in statistics. First-order drug decay and composed dose–response functions are differentiated with the chain rule, which is also what lets the gradient of a log-likelihood pass through nested link functions during model fitting.
The rule
If , then
In Leibniz notation, with ,
Intuition
Differentiate the outer function (leaving the inside alone), then multiply by the derivative of the inside. The rates multiply: if changes twice as fast as and changes three times as fast as , then changes six times as fast as .
Worked example 1:
The exponential survival/decay term is composed with , where :
Worked example 2:
Here and , so and :
At : .
Computing it
R
# Symbolic
D(expression((3*x^2 + 1)^5), "x")
# 5 * (3 * x^2 + 1)^4 * (3 * (2 * x)) == 30x(3x^2+1)^4
# Numeric check at x = 1
library(numDeriv)
grad(function(x) (3*x^2 + 1)^5, 1) # 7680
Python
import sympy as sp
x, lam = sp.symbols("x lambda")
sp.diff(sp.exp(-lam * x), x) # -lambda*exp(-lambda*x)
sp.diff((3*x**2 + 1)**5, x) # 30*x*(3*x**2 + 1)**4
# Numeric check at x = 1
h = 1e-6
f = lambda x: (3*x**2 + 1)**5
(f(1 + h) - f(1 - h)) / (2 * h) # ~7680.0
Julia
using Symbolics
@variables x λ
Symbolics.derivative(exp(-λ * x), x) # -λ*exp(-λ*x)
Symbolics.derivative((3x^2 + 1)^5, x) # 30x*(1 + 3(x^2))^4
using ForwardDiff
ForwardDiff.derivative(x -> (3x^2 + 1)^5, 1.0) # 7680.0
Why it matters for statistics
The chain rule underlies the delta method, which approximates the variance of a transformed estimator using . It is also how gradients propagate through the layers of a model during backpropagation, making automatic differentiation and modern machine learning possible.