Monotonic Transformations
A monotonic transformation reshapes a scale without scrambling its order. This simple property is why we can maximize a log-likelihood instead of a likelihood, and why the CDF gives a universal recipe for simulating any random variable.
Definition
A function is monotonically increasing if it preserves order:
and monotonically decreasing if . When the inequalities are strict, is strictly monotone. Examples of strictly increasing functions on their domains: , , .
Order and argmax are preserved
The key consequence: applying a strictly increasing does not change where a function attains its maximum or minimum.
The location of the peak is identical; only the height changes.
Why log-likelihood works
The likelihood of iid data is a product,
which is numerically awkward (products of many small numbers underflow). Because is strictly increasing, maximizing the log-likelihood
yields the same maximizer — turning a product into a friendly sum without changing the answer. See maximum likelihood.
CDFs are non-decreasing
Every cumulative distribution function is monotonically non-decreasing, rising from 0 to 1. This monotonicity is what makes the next result possible.
The probability integral transform
If is continuous with CDF , then feeding through its own CDF produces a standard uniform variable:
Reading this backward gives inverse-CDF (inverse-transform) sampling: to simulate , draw and set
Because is monotone it has a (quantile) inverse , so uniform draws map cleanly onto draws from any target distribution.
Worked example
Let with . Solving for :
So exponential samples come free from uniform samples. And since is also , one often writes .
Simulation
We demonstrate two facts: (1) is uniform, and (2) preserves the argmax.
R
set.seed(7)
# (1) Probability integral transform: F(X) ~ Uniform(0,1)
x <- rnorm(1e5, mean = 5, sd = 2)
u <- pnorm(x, 5, 2) # apply the true CDF
c(mean(u), var(u)) # ~0.5 and ~0.0833 (= 1/12): uniform
# (2) log preserves the argmax of a positive function
theta <- seq(0.1, 10, by = 0.01)
h <- dgamma(theta, shape = 3, rate = 1) # a positive curve
theta[which.max(h)] == theta[which.max(log(h))] # TRUE
Python
import numpy as np
from scipy import stats
rng = np.random.default_rng(7)
# (1) F(X) ~ Uniform(0,1)
x = rng.normal(5, 2, 100_000)
u = stats.norm.cdf(x, 5, 2)
print(u.mean(), u.var()) # ~0.5, ~0.0833 (= 1/12)
# (2) log preserves the argmax
theta = np.arange(0.1, 10, 0.01)
h = stats.gamma.pdf(theta, a=3, scale=1)
print(theta[h.argmax()] == theta[np.log(h).argmax()]) # True
0.49975819739868915 0.08306185653227188
True
Julia
using Random, Statistics, Distributions
Random.seed!(7)
# (1) F(X) ~ Uniform(0,1)
d = Normal(5, 2)
x = rand(d, 100_000)
u = cdf.(d, x)
mean(u), var(u) # ~0.5, ~0.0833 (= 1/12)
# (2) log preserves the argmax
theta = 0.1:0.01:10
h = pdf.(Gamma(3, 1), theta)
argmax(h) == argmax(log.(h)) # true
Why it matters for statistics
Monotonic transformations underpin two workhorses of statistical practice. Optimization on the log scale makes maximum likelihood numerically stable while leaving the estimate untouched, and the probability integral transform is the foundation of random-number generation, quantile methods, and copulas. Recognizing that order-preserving maps leave argmax and rankings intact lets you swap scales freely for convenience.