Taylor and Maclaurin Series
A Taylor series approximates a smooth function by a polynomial built from its derivatives at a point. This local approximation is the workhorse behind the delta method, Newton-type optimization, and many large-sample expansions.
Taylor expansion
If is infinitely differentiable near ,
A Maclaurin series is the special case .
Key expansions
Worked example: successive polynomials for
Watching the approximation improve term by term is the best way to build intuition. The Maclaurin polynomials of use only odd powers; write for the degree- truncation.
Each higher-degree polynomial tracks over a wider window before peeling off. Evaluate them at , where the true value is :
Each extra pair of terms cuts the error by more than an order of magnitude near the center of expansion.
How good is the approximation? The remainder
Truncating after the degree- term leaves the Lagrange remainder
for some between and . For every derivative is bounded by , so . At with this bounds the error by , comfortably above the we actually saw. The factorial in the denominator is why Taylor series converge so fast close to and why the error explodes once grows large — exactly the peeling-off you see in the figure.
Computing it
R
x <- pi / 2
Tn <- function(N) sum(sapply(seq(1, N, by = 2),
\(n) (-1)^((n - 1) / 2) * x^n / factorial(n)))
approx <- sapply(c(1, 3, 5, 7), Tn)
rbind(approx, error = sin(x) - approx)
# [,1] [,2] [,3] [,4]
# approx 1.57080 0.92483 1.00452 0.999849
# error -0.57080 0.07517 -0.00452 0.000151
Python
import sympy as sp
x = sp.symbols("x")
print(sp.series(sp.sin(x), x, 0, 8)) # x - x**3/6 + x**5/120 - x**7/5040 + O(x**8)
import math
xv = math.pi / 2
Tn = lambda N: sum((-1)**((n - 1)//2) * xv**n / math.factorial(n)
for n in range(1, N + 1, 2))
for N in (1, 3, 5, 7):
print(N, Tn(N), math.sin(xv) - Tn(N)) # error shrinks ~10x each step
x - x**3/6 + x**5/120 - x**7/5040 + O(x**8)
1 1.5707963267948966 -0.5707963267948966
3 0.9248322292886504 0.07516777071134961
5 1.0045248555348174 -0.004524855534817407
7 0.9998431013994987 0.00015689860050127624
Julia
using Symbolics
xv = pi / 2
Tn(N) = sum((-1)^((n - 1) ÷ 2) * xv^n / factorial(n) for n in 1:2:N)
[(N, Tn(N), sin(xv) - Tn(N)) for N in (1, 3, 5, 7)]
# error: -0.571, +0.075, -0.0045, +0.00015
Why it matters for statistics
The delta method approximates the variance of a transformed estimator $g(\hat\theta)$ via a first-order Taylor expansion, $g(\hat\theta) \approx g(\theta) + g’(\theta)(\hat\theta - \theta)$, so that $\operatorname{Var}(g(\hat\theta)) \approx [g’(\theta)]^2 \operatorname{Var}(\hat\theta)$. For example, a proportion $\hat p$ has variance $p(1-p)/n$; the log-odds $g(p) = \log\frac{p}{1-p}$ has derivative $g’(p) = \frac{1}{p(1-p)}$, so the delta method gives $\operatorname{Var}(\log\text{-odds}) \approx \frac{1}{n,p(1-p)}$ — with $p = 0.2$, $n = 100$ that is a standard error of $0.25$, exactly the log-odds standard error a logistic regression reports. Second-order expansions of the log-likelihood give the Fisher information and Newton-Raphson updates for maximum likelihood. In epidemiology the same idea linearizes a nonlinear transmission rate around an operating point so a model can be studied near an equilibrium, and the delta method then propagates measurement error into the resulting estimates. In short, Taylor series turn intractable nonlinear quantities into tractable linear or quadratic ones.