Factorial Designs
Factorial designs let you study several factors at once by testing every combination of their levels. This is far more efficient than changing one factor at a time, and — crucially — it is the only way to detect interactions, where the effect of one factor depends on the level of another.
Crossing factors: the design
Suppose we have factors, each set to a low () and high () level. Crossing all levels gives treatment combinations (runs). With factors and we get runs; with we get , and so on.
Coding levels as makes the algebra clean and the columns orthogonal.
| Run | |||
|---|---|---|---|
| 1 | |||
| 2 | |||
| 3 | |||
| 4 |
The interaction column is the elementwise product of the and columns.
Main effects and interactions
A main effect of a factor is the change in the average response as the factor goes from low to high. An interaction measures how much the effect of one factor changes across the levels of another.
Why one-factor-at-a-time (OFAT) fails: if you vary while holding fixed, then vary while holding fixed, you never observe the – combination that reveals synergy or antagonism. Factorial designs cross the factors, so interactions become estimable.
Effect = difference of averages
Using the coding, the effect of a factor is the average response at its high level minus the average at its low level:
Equivalently, each effect is a contrast — a weighted sum with the sign column as weights, divided by the number of (or ) entries.
Estimating effects via a linear model
Fit the regression
where . Because the design columns are orthogonal, the least-squares coefficients are independent, and each regression coefficient equals half the corresponding effect:
Worked example
Take the four corner means (average response at each combination):
Main effect of — average at minus average at :
Main effect of :
Interaction — use the sign column :
So going high on adds about units, high on adds about , and there is a modest positive interaction of : the boost from is larger when is also high. The fitted coefficients would be , , .
In code
R
set.seed(1)
# Full 2^2 factorial with 3 replicates per corner
d <- expand.grid(A = c(-1, 1), B = c(-1, 1))
d <- d[rep(1:4, each = 3), ]
# True model: intercept 28.75, A eff 15 (beta 7.5), B eff 10 (5), AB eff 5 (2.5)
dA + 5 * dA * d$B + rnorm(nrow(d), 0, 2)
$
fit <- lm(y ~ A * B, data = d)
coef(fit)
# (Intercept) A B A:B
# ~28.7 ~7.4 ~5.0 ~2.6 (each = half the effect)
Python
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from pyDOE3 import fullfact
np.random.seed(1)
# fullfact([2,2]) gives 0/1 levels; recode to -1/+1
levels = fullfact([2, 2]) * 2 - 1 # rows of (A, B) in {-1,+1}
d = pd.DataFrame(np.repeat(levels, 3, axis=0), columns=["A", "B"])
d["y"] = (28.75 + 7.5 * d.A + 5 * d.B + 2.5 * d.A * d.B
+ np.random.normal(0, 2, len(d)))
fit = smf.ols("y ~ A * B", data=d).fit()
print(fit.params)
# Intercept ~28.7, A ~7.4, B ~5.0, A:B ~2.6
Julia
using DataFrames, GLM, Random
Random.seed!(1)
# Build the full 2^2 grid, 3 reps each
grid = [(a, b) for a in (-1, 1) for b in (-1, 1)]
d = DataFrame(A = Int[], B = Int[])
for (a, b) in grid, _ in 1:3
push!(d, (a, b))
end
d.y = 28.75 .+ 7.5 .* d.A .+ 5 .* d.B .+ 2.5 .* d.A .* d.B .+ 2 .* randn(nrow(d))
fit = lm(@formula(y ~ A * B), d)
coef(fit)
# ≈ [28.7, 7.4, 5.0, 2.6] (intercept, A, B, A&B)
Why it matters for statistics
Factorial designs are the workhorse of planned experimentation. They squeeze maximum information out of each run, estimate main effects and interactions with the same data, and keep effect estimates orthogonal (hence uncorrelated). In fields from agronomy to clinical trials to industrial process optimization, they answer “which factors matter, and do they act together?” efficiently — a question OFAT experiments simply cannot address.