Measures of Center
A measure of center summarizes a distribution or dataset with a single “typical” value. Choosing the right one — mean, median, or mode — depends on shape, skew, and how much you trust the extremes. Medians are prized in biology for just this reason: the median incubation period or median survival time is far more robust than the mean for the right-skewed distributions typical of infectious-disease data.
The three classic measures
- Mean (arithmetic average): . The balance point of the data; the sample analogue of .
- Median: the middle value when the data are sorted (the average of the two middle values if is even). Splits the data into two halves.
- Mode: the most frequent value (or the peak of a density). The only center that applies to purely categorical data.
Quantiles, percentiles, and order statistics
Sorting the data gives the order statistics; is the minimum and the maximum. A quantile at level is a value below which a fraction of the data fall; percentiles are quantiles expressed in percent. The median is the quantile.
When to use which
- Symmetric, light-tailed data (e.g. roughly normal): mean and median nearly coincide; the mean is efficient.
- Skewed data (incomes, waiting times): the mean is pulled toward the long tail, so the median better reflects the “typical” case.
- Outliers or contamination: the median is robust — one wild value barely moves it — while a single large outlier can dominate the mean.
- Categorical / multimodal data: use the mode.
Worked example
Dataset: , with .
- Mean: .
- Median: sorted values are , so the median is .
- Mode: (appears twice).
The lone value drags the mean up to , far from every other point, while the median stays representative. This is robustness in action.
Simulation
x <- c(2, 4, 4, 5, 100)
mean(x) # 23
median(x) # 4
quantile(x, c(.25, .5, .75))
# mode: value with highest frequency
as.numeric(names(which.max(table(x)))) # 4
Python
import numpy as np
from statistics import mode
x = np.array([2, 4, 4, 5, 100])
print(np.mean(x)) # 23.0
print(np.median(x)) # 4.0
print(np.quantile(x, [.25, .5, .75]))
print(mode(x.tolist())) # 4
23.0
4.0
[4. 4. 5.]
4
Julia
using Statistics, StatsBase
x = [2, 4, 4, 5, 100]
println(mean(x)) # 23.0
println(median(x)) # 4.0
println(quantile(x, [.25, .5, .75]))
println(mode(x)) # 4
Why it matters for statistics
The center you report shapes the story your data tell. The mean feeds directly into variances, standard errors, and the central limit theorem; the median and quantiles give robust, distribution-free summaries that survive outliers and skew. Knowing when each is appropriate is the difference between an honest summary and a misleading one.