Partial Derivatives

A partial derivative measures how a multivariable function changes as you vary one input while holding the others fixed. They are the foundation of the gradient and the Jacobian, and of optimizing likelihoods that depend on several parameters.

Definition and notation

For f(x,y)f(x, y), the partial derivative with respect to xx treats yy as a constant:

fx=limh0f(x+h,y)f(x,y)h.\frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x + h,\, y) - f(x,\, y)}{h}.

Common notations include fx\dfrac{\partial f}{\partial x}, fxf_x, and xf\partial_x f. The curly \partial (instead of dd) signals that other variables are being held constant.

Worked example: f(x,y)=x2y+sin(y)f(x,y) = x^2 y + \sin(y)

Differentiate with respect to xx, treating yy as constant (so siny\sin y is a constant and drops out):

fx=2xy.\frac{\partial f}{\partial x} = 2xy .

Differentiate with respect to yy, treating xx as constant:

fy=x2+cos(y).\frac{\partial f}{\partial y} = x^2 + \cos(y) .

At the point (x,y)=(1,0)(x, y) = (1, 0): fx=2(1)(0)=0f_x = 2(1)(0) = 0 and fy=12+cos0=2f_y = 1^2 + \cos 0 = 2.

Connection to the gradient and Jacobian

Stacking the partials of a scalar function into a vector gives the gradient:

f=(fx,  fy).\nabla f = \left(\frac{\partial f}{\partial x},\; \frac{\partial f}{\partial y}\right).

For a vector-valued function, arranging all partials into a matrix gives the Jacobian. Partial derivatives are the individual entries from which both objects are built.

Computing it

R

# Symbolic partials with base R
f <- expression(x^2 * y + sin(y))
D(f, "x")   # 2 * x * y
D(f, "y")   # x^2 + cos(y)

# Numeric gradient at (1, 0)
library(numDeriv)
grad(function(v) v[1]^2 * v[2] + sin(v[2]), c(1, 0))   # 0  2

Python

import sympy as sp
x, y = sp.symbols("x y")
f = x**2 * y + sp.sin(y)
sp.diff(f, x)   # 2*x*y
sp.diff(f, y)   # x**2 + cos(y)

# Numeric partials at (1, 0)
import numpy as np
g = lambda v: v[0]**2 * v[1] + np.sin(v[1])
h = 1e-6
[(g([1 + h, 0]) - g([1 - h, 0])) / (2*h),   # ~0
 (g([1, 0 + h]) - g([1, 0 - h])) / (2*h)]   # ~2

Julia

using Symbolics
@variables x y
f = x^2 * y + sin(y)
Symbolics.derivative(f, x)   # 2x*y
Symbolics.derivative(f, y)   # x^2 + cos(y)

using ForwardDiff
g(v) = v[1]^2 * v[2] + sin(v[2])
ForwardDiff.gradient(g, [1.0, 0.0])   # [0.0, 2.0]

Why it matters for statistics

Log-likelihoods usually depend on several parameters at once (a mean and a variance, or a whole regression coefficient vector). Setting each partial derivative to zero produces the system of score equations solved to find maximum likelihood estimates, and the matrix of second partials becomes the observed information.