Maximum Likelihood Estimation
Maximum likelihood estimation (MLE) is the workhorse for fitting statistical models in epidemiology, from case-fatality proportions to hazard rates. It picks the parameter values that make the observed data most probable.
The likelihood function
Given independent observations from a model with density (or mass function) , the likelihood is the joint probability of the data viewed as a function of the parameter :
The maximum likelihood estimate is the value that maximizes this:
Why work with the log
Products of many small probabilities underflow numerically and are awkward to differentiate. Taking the logarithm turns the product into a sum:
Because is a strictly increasing monotonic transformation, it preserves the location of the maximum: . See exponentials and logarithms.
Finding the maximum
We locate the maximum with calculus: solve the score equation
and confirm it is a maximum by checking . When no closed form exists, maximize numerically (see optimization).
Worked example: Bernoulli / Binomial
Suppose are independent Bernoulli trials with success probability , so . Let be the number of successes. The log-likelihood is
Differentiate and set to zero:
The second derivative confirms a maximum. The MLE of a Bernoulli/Binomial proportion is simply the sample mean.
In code
Below we maximize the same log-likelihood numerically and compare to the closed form .
R
set.seed(1)
x <- rbinom(200, size = 1, prob = 0.3) # 200 Bernoulli trials
negloglik <- function(p) -sum(dbinom(x, size = 1, prob = p, log = TRUE))
fit <- optimize(negloglik, interval = c(1e-6, 1 - 1e-6))
c(numeric = fit$minimum, closed_form = mean(x))
$```
### Python
```python
import numpy as np
from scipy.optimize import minimize_scalar
from scipy.stats import bernoulli
rng = np.random.default_rng(1)
x = rng.binomial(1, 0.3, size=200)
negloglik = lambda p: -np.sum(bernoulli.logpmf(x, p))
fit = minimize_scalar(negloglik, bounds=(1e-6, 1 - 1e-6), method="bounded")
print(fit.x, x.mean()) # numeric vs closed form
0.3049994360315715 0.305
Julia
using Distributions, Optim, Random
Random.seed!(1)
x = rand(Bernoulli(0.3), 200)
negloglik(p) = -sum(logpdf.(Bernoulli(p[1]), x))
fit = optimize(negloglik, [0.5], BFGS())
println(Optim.minimizer(fit)[1], " ", sum(x)/length(x))
All three recover .
Why it matters for statistics
MLE provides a general, principled recipe for estimating parameters of essentially any probabilistic model. Its estimators are consistent and asymptotically normal, which underpins standard errors, confidence intervals, and likelihood-ratio tests used throughout inference and epidemiological modeling.