Measures of Association and Impact
Once a study is designed and the cases are counted, the numbers have to be turned into quantities that mean something. Three questions organize the whole vocabulary: how common is the outcome, how much does exposure change it, and how much disease would we prevent by removing the exposure. The answers are the frequency measures, the measures of association, and the measures of impact.
Frequency measures
Three measures describe how often the outcome occurs. The incidence proportion, or risk, is the fraction of an at-risk group that develops the outcome over a period, a probability between and . The incidence rate divides new cases by the person-time observed, so its units are cases per person-time and it can exceed ; it is the right measure when follow-up varies across people. Prevalence is the fraction of the population that has the outcome at a moment, and it reflects both how fast cases arise and how long they last.
For a stable condition these connect through a simple relation,
which explains why a highly infectious but short illness can have low prevalence while a mild chronic one has high prevalence.
Measures of association
Association measures compare the outcome frequency between exposed and unexposed groups. Write the exposed risk as and the unexposed risk as . The risk ratio and the rate ratio (the same idea for rates) are multiplicative: an of means the exposure triples the risk. The odds ratio compares odds instead of risks,
and it is the measure a case-control study returns and the quantity a logistic regression models (see Generalized Linear Models). The risk difference is additive and stays on the scale of absolute risk, which is what patients and public-health budgets actually experience.
When the odds ratio tracks the risk ratio
The odds ratio and the risk ratio agree only when the outcome is rare. As and both shrink toward zero, and approach and the odds ratio collapses onto the risk ratio. When the outcome is common, the odds ratio sits farther from than the risk ratio, so it overstates the association if read as if it were a risk ratio.
This gap is sharpened by non-collapsibility: the odds ratio can change when you adjust for a covariate that is not a confounder, purely because of the nonlinear odds transformation, whereas the risk ratio and risk difference do not behave this way. The right panel of the figure fixes the risk ratio at and shows the odds ratio climbing above it as the baseline risk rises — the same effect that makes a rare-disease case-control study interpretable and a common-outcome one treacherous.
Measures of impact
Impact measures translate an association into disease you could prevent. The attributable fraction in the exposed is the share of the exposed group’s risk that the exposure is responsible for,
so with two-thirds of disease among the exposed is attributable to the exposure. The population attributable fraction scales this to the whole population by the exposure prevalence ,
and answers what fraction of all cases would vanish if the exposure were removed. The number needed to treat (or, for a protective exposure such as a vaccine, the number needed to vaccinate) is the reciprocal of the risk difference, : how many people you must treat or vaccinate to prevent one case.
A worked example
Take a study of exposed and unexposed people in which of the exposed and of the unexposed develop the outcome, so the table holds , , , . The risks are and , giving a risk ratio of (exposure triples the risk), a risk difference of (an extra cases per exposed), and an odds ratio of , already noticeably above the risk ratio because this outcome is common.
The attributable fraction in the exposed is , so two-thirds of disease among the exposed is due to the exposure. With half the sample exposed, the overall risk is and the population attributable fraction is : removing the exposure would prevent half of all cases. The number needed to treat is , meaning you would remove the exposure from five people to prevent one case.
In code
We fill the table once and read every measure off it.
R
a <- 300; b <- 700; c <- 100; d <- 900
risk_exp <- a / (a + b)
risk_unexp <- c / (c + d)
rr <- risk_exp / risk_unexp
or <- (a * d) / (b * c)
rd <- risk_exp - risk_unexp
afe <- (rr - 1) / rr
risk_overall <- (a + c) / (a + b + c + d)
paf <- (risk_overall - risk_unexp) / risk_overall
c(RR = rr, OR = or, RD = rd, AFe = afe, PAF = paf, NNT = 1 / rd)
Python
We use Polars to hold the table, then compute the measures.
import polars as pl
tab = pl.DataFrame({
"exposure": ["exposed", "unexposed"],
"cases": [300, 100],
"noncases": [700, 900],
})
a, c = tab["cases"]
b, d = tab["noncases"]
risk_exp = a / (a + b)
risk_unexp = c / (c + d)
risk_overall = (a + c) / (a + b + c + d)
rr = risk_exp / risk_unexp
odds_ratio = (a * d) / (b * c)
rd = risk_exp - risk_unexp
afe = (rr - 1) / rr
paf = (risk_overall - risk_unexp) / risk_overall
print(f"risk ratio = {rr:.3f}")
print(f"odds ratio = {odds_ratio:.3f}")
print(f"risk difference = {rd:.3f}")
print(f"AF in exposed = {afe:.3f}")
print(f"pop. attr. frac. = {paf:.3f}")
print(f"number needed = {1 / rd:.1f}")
risk ratio = 3.000
odds ratio = 3.857
risk difference = 0.200
AF in exposed = 0.667
pop. attr. frac. = 0.500
number needed = 5.0
Julia
a, b, c, d = 300, 700, 100, 900
risk_exp = a / (a + b)
risk_unexp = c / (c + d)
rr = risk_exp / risk_unexp
or = (a * d) / (b * c)
rd = risk_exp - risk_unexp
afe = (rr - 1) / rr
risk_overall = (a + c) / (a + b + c + d)
paf = (risk_overall - risk_unexp) / risk_overall
(RR = rr, OR = or, RD = rd, AFe = afe, PAF = paf, NNT = 1 / rd)
Why it matters
A ratio and a difference can describe the same exposure yet tell public health very different stories. A large risk ratio for a rare outcome may move few people, while a modest risk difference for a common one can dominate the disease burden, and the population attributable fraction is what tells a program how much illness a control measure could actually erase. Knowing when the odds ratio still stands in for the risk ratio, and when its non-collapsibility makes it a poor summary of absolute effect, is what keeps an association from being oversold — and it depends directly on the study design that produced the counts in the first place.