Global Sensitivity Analysis

Global sensitivity analysis (GSA) attributes the variation in a model’s output to its uncertain inputs. It answers “which knobs actually matter?” — essential for prioritizing data collection, simplifying models, and communicating which epidemiological parameters drive a forecast.

Local vs global

Local sensitivity looks at partial derivatives at a single nominal point, Siloc=yxix=x0,S^{\text{loc}}_i = \frac{\partial y}{\partial x_i}\Big|_{x = x_0}, a one-at-a-time (OAT) view. It is cheap but only describes the neighborhood of x0x_0, ignores interactions, and depends on the (arbitrary) units and base point. See partial derivatives.

Global sensitivity varies all inputs across their full uncertainty ranges simultaneously, treating the inputs as random variables X1,,XkX_1,\dots,X_k and studying how the output Y=f(X)Y = f(X) responds over the whole space. The input designs are typically generated by Latin Hypercube Sampling.

Morris elementary-effects screening

The Morris method is a cheap global screening method. It computes elementary effects along one-at-a-time steps but from many randomly scattered starting points: EEi=f(x1,,xi+Δ,,xk)f(x)Δ.EE_i = \frac{f(x_1,\dots,x_i+\Delta,\dots,x_k) - f(x)}{\Delta}. Over many trajectories, the mean of EEi|EE_i| (denoted μi\mu_i^*) ranks overall importance, while the standard deviation σi\sigma_i flags nonlinearity and interactions. It uses few runs, so it is used to discard unimportant inputs before a more expensive analysis.

Variance-based Sobol indices

Sobol indices decompose the output variance by inputs. Using the law of total variance (see measures of variability): Var(Y)=Var ⁣(E[YXi])+E ⁣[Var(YXi)].\operatorname{Var}(Y) = \operatorname{Var}\!\big(\mathbb{E}[Y \mid X_i]\big) + \mathbb{E}\!\big[\operatorname{Var}(Y \mid X_i)\big].

The first-order index measures the variance explained by XiX_i alone: Si=Var ⁣(E[YXi])Var(Y).S_i = \frac{\operatorname{Var}\!\big(\mathbb{E}[Y \mid X_i]\big)}{\operatorname{Var}(Y)}.

The total-effect index measures all variance involving XiX_i, including every interaction it participates in: STi=E ⁣[Var(YXi)]Var(Y)=1Var ⁣(E[YXi])Var(Y),S_{Ti} = \frac{\mathbb{E}\!\big[\operatorname{Var}(Y \mid X_{\sim i})\big]}{\operatorname{Var}(Y)} = 1 - \frac{\operatorname{Var}\!\big(\mathbb{E}[Y \mid X_{\sim i}]\big)}{\operatorname{Var}(Y)}, where XiX_{\sim i} denotes all inputs except XiX_i.

Interpretation:

Worked example

Let Y=X1+X2+X1X2Y = X_1 + X_2 + X_1 X_2 with X1,X2X_1, X_2 independent, each Uniform(0,1)\text{Uniform}(0,1), so μ=12\mu = \tfrac12 and Var(Xi)=112\operatorname{Var}(X_i)=\tfrac{1}{12}.

Condition on X1=x1X_1 = x_1 and take the expectation over X2X_2 (mean 12\tfrac12): E[YX1=x1]=x1+12+12x1=12+32x1.\mathbb{E}[Y \mid X_1 = x_1] = x_1 + \tfrac12 + \tfrac12 x_1 = \tfrac12 + \tfrac32 x_1 . Its variance is (32)2Var(X1)=94112=316\big(\tfrac32\big)^2 \operatorname{Var}(X_1) = \tfrac{9}{4}\cdot\tfrac{1}{12} = \tfrac{3}{16}.

For the total variance, add the residual term E[Var(YX1)]\mathbb{E}[\operatorname{Var}(Y\mid X_1)]. Given X1=x1X_1=x_1, Y=x1+(1+x1)X2Y = x_1 + (1+x_1)X_2, so Var(YX1=x1)=(1+x1)2/12\operatorname{Var}(Y\mid X_1=x_1) = (1+x_1)^2/12 and E[Var(YX1)]=112E[(1+X1)2]=11273=736\mathbb{E}[\operatorname{Var}(Y\mid X_1)] = \tfrac{1}{12}\mathbb{E}[(1+X_1)^2] = \tfrac{1}{12}\cdot\tfrac{7}{3} = \tfrac{7}{36}. Hence Var(Y)=316+736=551440.382,S1=S2=3/1655/144=27550.491,\operatorname{Var}(Y) = \frac{3}{16} + \frac{7}{36} = \frac{55}{144}\approx 0.382, \qquad S_1 = S_2 = \frac{3/16}{55/144} = \frac{27}{55} \approx 0.491, so S1+S2=54550.982<1S_1 + S_2 = \tfrac{54}{55} \approx 0.982 < 1. The remaining 1.8%\approx 1.8\% (=1/55=1/55) is the pure interaction term X1X2X_1 X_2, so each input’s total effect is ST1=ST2=1S2=12755=28550.509>SiS_{T1} = S_{T2} = 1 - S_2 = 1 - \tfrac{27}{55} = \tfrac{28}{55}\approx 0.509 > S_i. Both inputs matter equally, and a small but nonzero share is interaction.

In code

R

# install.packages(c("sensitivity", "boot"))
library(sensitivity)
set.seed(1)

f <- function(X) X[, 1] + X[, 2] + X[, 1] * X[, 2]
n <- 10000
X1 <- data.frame(x1 = runif(n), x2 = runif(n))
X2 <- data.frame(x1 = runif(n), x2 = runif(n))

s <- sobolSalt(model = f, X1, X2, scheme = "A", nboot = 100)
print(s)     # First-order S_i ~ 0.49 each; total S_Ti ~ 0.51 each

# Morris screening:
m <- morris(model = f, factors = c("x1", "x2"), r = 50,
            design = list(type = "oat", levels = 6, grid.jump = 3))
print(m)     # mu.star ranks importance

Python

import numpy as np
from SALib.sample import sobol as sobol_sample
from SALib.analyze import sobol
np.random.seed(0)

problem = {"num_vars": 2, "names": ["x1", "x2"],
           "bounds": [[0, 1], [0, 1]]}

X = sobol_sample.sample(problem, 1024)          # Saltelli design
Y = X[:, 0] + X[:, 1] + X[:, 0] * X[:, 1]
Si = sobol.analyze(problem, Y)
print(Si["S1"])    # ~ [0.49, 0.49]
print(Si["ST"])    # ~ [0.51, 0.51]

Julia

using GlobalSensitivity, Statistics, Random
Random.seed!(0)

f(x) = x[1] + x[2] + x[1] * x[2]
bounds = [[0.0, 1.0], [0.0, 1.0]]

res = gsa(f, Sobol(order = [0, 1]), bounds; samples = 10_000)
println(res.S1)   # first-order  ~ [0.49, 0.49]
println(res.ST)   # total-effect ~ [0.51, 0.51]

Why it matters for statistics

GSA turns “uncertainty in the inputs” into a ranked, quantitative statement about “uncertainty in the output.” Sobol indices give a variance decomposition that is model-free (no linearity assumption), captures interactions, and is directly interpretable as a fraction of output variance. In epidemiology this tells you whether a forecast’s spread is driven by β\beta, by reporting rates, or by their interaction — guiding which parameters to measure more precisely.