Experimental Design

How data are collected determines what conclusions they can support. Good experimental design is what lets epidemiologists move from “associated with” toward “causes,” and recognizing sources of bias is essential when a true experiment is impossible.

Experimental vs. observational studies

Principles of good experiments

Observational designs

Sources of bias

Randomization in code

Randomly assign n=10n=10 subjects to treatment vs. control.

R

set.seed(1)
subjects <- 1:10
assignment <- sample(rep(c("treat", "control"), each = 5))  # random labels
data.frame(subject = subjects, group = assignment)

Python

import numpy as np
rng = np.random.default_rng(1)
labels = np.array(["treat"] * 5 + ["control"] * 5)
rng.shuffle(labels)                     # random assignment
print(list(zip(range(1, 11), labels)))
[(1, np.str_('control')), (2, np.str_('treat')), (3, np.str_('control')), (4, np.str_('treat')), (5, np.str_('treat')), (6, np.str_('treat')), (7, np.str_('control')), (8, np.str_('control')), (9, np.str_('control')), (10, np.str_('treat'))]

Julia

using Random
Random.seed!(1)
labels = vcat(fill("treat", 5), fill("control", 5))
shuffle!(labels)                        # random assignment
foreach(println, zip(1:10, labels))

Why it matters for statistics

Design decides which biases the analysis can and cannot fix: no statistical adjustment recovers information a flawed design never captured. Randomization and control are what make inferred effects causal, and naming the biases of observational studies guides both honest interpretation and better analysis.