Cox Proportional Hazards Regression

Cox regression relates covariates — treatment, age, viral load — to the time until an event such as death, infection, or clearance. It is the most widely used regression model for censored time-to-event data, and its coefficients translate directly into hazard ratios.

The model

Cox regression is a model for the hazard, the instantaneous event rate among those still at risk. For a subject with covariate vector xx it assumes h(tx)=h0(t)exβ.h(t \mid x) = h_0(t)\, e^{x^\top\beta}. Two pieces multiply together:

The model is called semiparametric because h0(t)h_0(t) is left completely unspecified — we never write down a formula for it — while the covariate effect exβe^{x^\top\beta} has a parametric (log-linear) form. The exponential link keeps the hazard positive for any values of β\beta, much as it does in logistic regression.

Hazard ratios

The coefficients are interpreted through exponentiation. Compare two subjects differing by one unit in covariate xjx_j and identical otherwise. The baseline h0(t)h_0(t) cancels, leaving the hazard ratio HRj=h(txj+1)h(txj)=eβj,\mathrm{HR}_j = \frac{h(t\mid x_j+1)}{h(t\mid x_j)} = e^{\beta_j}, a single number that does not depend on tt. So eβje^{\beta_j} is the multiplicative effect of a one-unit increase in xjx_j on the instantaneous risk:

The proportional-hazards assumption

Because the baseline cancels, the hazard ratio between any two covariate profiles is constant over time — this is the proportional-hazards assumption. Graphically, the hazard for one group is a fixed multiple of the hazard for another at every tt; their hazard curves never cross.

The assumption can fail — for instance, a surgery with high early risk but a long-term survival benefit has a hazard ratio that starts above 1 and falls below it. Common checks include:

When the assumption is untenable, remedies include stratifying on the offending variable or fitting explicitly time-varying coefficients.

Estimation by partial likelihood

How can we estimate β\beta without ever specifying h0(t)h_0(t)? Cox’s insight was the partial likelihood. At each observed event time tit_i, condition on the fact that one event happened among the risk set R(ti)R(t_i) (everyone still under observation), and ask which subject it was. Under the model, the probability that the subject who actually failed, with covariates xix_i, is the one to fail is exiβkR(ti)exkβ.\frac{e^{x_i^\top\beta}}{\sum_{k \in R(t_i)} e^{x_k^\top\beta}}. The baseline hazard cancels from every term, so it disappears entirely. Multiplying these contributions over all event times gives the partial likelihood L(β)=i:eventexiβkR(ti)exkβ,L(\beta) = \prod_{i:\,\text{event}} \frac{e^{x_i^\top\beta}}{\sum_{k \in R(t_i)} e^{x_k^\top\beta}}, which is maximized over β\beta by maximum likelihood methods. Only the order of the event times matters, not their spacing, which is precisely why h0(t)h_0(t) need never be modeled. When events are far enough apart that the constant-hazard picture holds locally, the model connects back to the exponential distribution.

Worked example: reading a coefficient

Suppose a trial fits a single covariate treatment (0 = placebo, 1 = drug) and reports β^=0.41\hat\beta = -0.41. The hazard ratio is HR=e0.410.66.\mathrm{HR} = e^{-0.41} \approx 0.66. Patients on the drug face about a 34% lower instantaneous risk of the event at any given time (10.66=0.341 - 0.66 = 0.34).

Now suppose a covariate stage has β^=0.405\hat\beta = 0.405, giving HR=e0.4051.5\mathrm{HR} = e^{0.405} \approx 1.5. Each one-step increase in disease stage multiplies the hazard by 1.51.5 — a 50% higher instantaneous risk. A 95% confidence interval that excludes HR=1\mathrm{HR}=1 (equivalently β=0\beta=0) indicates a statistically significant effect. Note that a hazard ratio describes the rate of the event, not a difference in mean survival time directly.

In code

R

library(survival)

# Built-in ovarian cancer data: futime = time, fustat = event indicator
fit <- coxph(Surv(futime, fustat) ~ age + rx, data = ovarian)
summary(fit)          # coef, exp(coef) = hazard ratio, and p-values

# Check proportional hazards via scaled Schoenfeld residuals
cox.zph(fit)          # a small p-value flags a PH violation

Python

from lifelines import CoxPHFitter
from lifelines.datasets import load_rossi

df = load_rossi()     # 'week' = time, 'arrest' = event indicator
cph = CoxPHFitter()
cph.fit(df, duration_col="week", event_col="arrest")
cph.print_summary()   # coef and exp(coef) = hazard ratio per covariate

cph.check_assumptions(df)   # proportional-hazards diagnostics

Julia

using Survival, DataFrames

# EventTime wraps (time, event-occurred?) for each subject
df = DataFrame(time = [4, 6, 8, 10, 12, 14],
               status = Bool[1, 0, 1, 1, 0, 1],
               x = [0, 1, 0, 1, 1, 0])
df.et = EventTime.(df.time, df.status)

model = coxph(@formula(et ~ x), df)
coef(model)                 # beta; exp.(coef(model)) gives hazard ratios

Why it matters

Cox regression is the default tool for quantifying how covariates affect survival while accounting for censoring, without committing to a shape for the baseline hazard. Its output — hazard ratios with confidence intervals — is the standard language of clinical and epidemiological reporting, from treatment effects to prognostic factors. Understanding the proportional-hazards assumption and the partial-likelihood machinery is what lets you fit these models responsibly and know when they break.