Taylor and Maclaurin Series

A Taylor series approximates a smooth function by a polynomial built from its derivatives at a point. This local approximation is the workhorse behind the delta method, Newton-type optimization, and many large-sample expansions.

Taylor expansion

If ff is infinitely differentiable near aa,

f(x)=n=0f(n)(a)n!(xa)n=f(a)+f(a)(xa)+f(a)2!(xa)2+f(x) = \sum_{n=0}^{\infty} \frac{f^{(n)}(a)}{n!}\,(x - a)^{n} = f(a) + f'(a)(x - a) + \frac{f''(a)}{2!}(x - a)^2 + \cdots

A Maclaurin series is the special case a=0a = 0.

Key expansions

ex=n=0xnn!=1+x+x22+x36+sinx=xx33!+x55!ln(1+x)=xx22+x33(x<1)\begin{aligned} e^{x} &= \sum_{n=0}^{\infty} \frac{x^{n}}{n!} = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \cdots \\ \sin x &= x - \frac{x^3}{3!} + \frac{x^5}{5!} - \cdots \\ \ln(1 + x) &= x - \frac{x^2}{2} + \frac{x^3}{3} - \cdots \quad (|x| < 1) \end{aligned}

Worked example: successive polynomials for sinx\sin x

Watching the approximation improve term by term is the best way to build intuition. The Maclaurin polynomials of sinx\sin x use only odd powers; write TNT_N for the degree-NN truncation.

Taylor polynomials of sin x of degree 1, 3, 5, 7: each added term hugs the curve over a wider interval before peeling away.

Each higher-degree polynomial tracks sinx\sin x over a wider window before peeling off. Evaluate them at x=π21.5708x = \tfrac{\pi}{2} \approx 1.5708, where the true value is sin(π/2)=1\sin(\pi/2) = 1:

T1=x=1.5708(error +0.5708)T3=xx33!=0.9248(error 0.0752)T5=T3+x55!=1.0045(error +0.0045)T7=T5x77!=0.99985(error 0.00015)\begin{aligned} T_1 &= x = 1.5708 &&(\text{error } +0.5708) \\ T_3 &= x - \tfrac{x^3}{3!} = 0.9248 &&(\text{error } -0.0752) \\ T_5 &= T_3 + \tfrac{x^5}{5!} = 1.0045 &&(\text{error } +0.0045) \\ T_7 &= T_5 - \tfrac{x^7}{7!} = 0.99985 &&(\text{error } -0.00015) \end{aligned}

Each extra pair of terms cuts the error by more than an order of magnitude near the center of expansion.

How good is the approximation? The remainder

Truncating after the degree-NN term leaves the Lagrange remainder

f(x)TN(x)=f(N+1)(ξ)(N+1)!(xa)N+1f(x) - T_N(x) = \frac{f^{(N+1)}(\xi)}{(N+1)!}\,(x - a)^{N+1}

for some ξ\xi between aa and xx. For sinx\sin x every derivative is bounded by 11, so f(x)TN(x)xaN+1(N+1)!|f(x) - T_N(x)| \le \dfrac{|x - a|^{N+1}}{(N+1)!}. At x=π/2x = \pi/2 with N=7N = 7 this bounds the error by (π/2)88!9.2×104\dfrac{(\pi/2)^8}{8!} \approx 9.2\times 10^{-4}, comfortably above the 1.5×1041.5\times 10^{-4} we actually saw. The factorial in the denominator is why Taylor series converge so fast close to aa and why the error explodes once xa|x-a| grows large — exactly the peeling-off you see in the figure.

Computing it

R

x <- pi / 2
Tn <- function(N) sum(sapply(seq(1, N, by = 2),
                             \(n) (-1)^((n - 1) / 2) * x^n / factorial(n)))
approx <- sapply(c(1, 3, 5, 7), Tn)
rbind(approx, error = sin(x) - approx)
#            [,1]     [,2]     [,3]      [,4]
# approx  1.57080  0.92483  1.00452  0.999849
# error  -0.57080  0.07517 -0.00452  0.000151

Python

import sympy as sp
x = sp.symbols("x")
print(sp.series(sp.sin(x), x, 0, 8))   # x - x**3/6 + x**5/120 - x**7/5040 + O(x**8)

import math
xv = math.pi / 2
Tn = lambda N: sum((-1)**((n - 1)//2) * xv**n / math.factorial(n)
                   for n in range(1, N + 1, 2))
for N in (1, 3, 5, 7):
    print(N, Tn(N), math.sin(xv) - Tn(N))   # error shrinks ~10x each step
x - x**3/6 + x**5/120 - x**7/5040 + O(x**8)
1 1.5707963267948966 -0.5707963267948966
3 0.9248322292886504 0.07516777071134961
5 1.0045248555348174 -0.004524855534817407
7 0.9998431013994987 0.00015689860050127624

Julia

using Symbolics
xv = pi / 2
Tn(N) = sum((-1)^((n - 1) ÷ 2) * xv^n / factorial(n) for n in 1:2:N)
[(N, Tn(N), sin(xv) - Tn(N)) for N in (1, 3, 5, 7)]
# error: -0.571, +0.075, -0.0045, +0.00015

Why it matters for statistics

The delta method approximates the variance of a transformed estimator $g(\hat\theta)$ via a first-order Taylor expansion, $g(\hat\theta) \approx g(\theta) + g’(\theta)(\hat\theta - \theta)$, so that $\operatorname{Var}(g(\hat\theta)) \approx [g’(\theta)]^2 \operatorname{Var}(\hat\theta)$. For example, a proportion $\hat p$ has variance $p(1-p)/n$; the log-odds $g(p) = \log\frac{p}{1-p}$ has derivative $g’(p) = \frac{1}{p(1-p)}$, so the delta method gives $\operatorname{Var}(\log\text{-odds}) \approx \frac{1}{n,p(1-p)}$ — with $p = 0.2$, $n = 100$ that is a standard error of $0.25$, exactly the log-odds standard error a logistic regression reports. Second-order expansions of the log-likelihood give the Fisher information and Newton-Raphson updates for maximum likelihood. In epidemiology the same idea linearizes a nonlinear transmission rate around an operating point so a model can be studied near an equilibrium, and the delta method then propagates measurement error into the resulting estimates. In short, Taylor series turn intractable nonlinear quantities into tractable linear or quadratic ones.