Randomness & Random Number Generation

Stochastic simulation is the beating heart of mathematical biology — epidemic models, bootstraps, permutation tests, coalescent genealogies, MCMC. All of it rests on the computer’s ability to produce “random” numbers. But a computer is a deterministic machine, so the numbers are not truly random at all: they are pseudo-random, produced by a formula that generates a stream of numbers with the statistical properties of randomness.

Understanding that one fact — the randomness is a reproducible illusion — is what lets you make your stochastic work both correct and reproducible.

Pseudo-Random, and Why That’s Good

A pseudo-random number generator (PRNG) starts from a seed and deterministically produces a long stream of numbers that look random. Same seed, same stream — every time, on every machine. Far from a limitation, this is exactly what you want: it makes a stochastic analysis reproducible.

Always set a seed at the top of any script that uses randomness.

set.seed(2026)          # R
using Random; Random.seed!(2026)      # Julia
import numpy as np
rng = np.random.default_rng(2026)     # Python: make a seeded generator

With the seed fixed, two runs produce identical draws — the foundation of a reproducible result:

a = np.random.default_rng(42)
b = np.random.default_rng(42)
print(a.integers(0, 100, 6))
print(b.integers(0, 100, 6))    # identical stream
[ 8 77 65 43 43 85]
[ 8 77 65 43 43 85]

Use a Modern Generator

Not all generators are equally good, and the defaults have improved. A few practical rules:

Drawing From Any Distribution

Your generator gives you uniform numbers on [0,1][0, 1]. Everything else is built from those. The most fundamental construction is inverse-CDF (inverse-transform) sampling: if UUniform(0,1)U \sim \text{Uniform}(0,1), then F1(U)F^{-1}(U) has the distribution with CDF FF. For an Exponential(λ)\text{Exponential}(\lambda) the inverse CDF is x=ln(U)/λx = -\ln(U)/\lambda, so a stream of uniforms becomes a stream of exponential waiting times:

Inverse-CDF sampling: uniform draws on the left, passed through x = -ln(U)/lambda, come out exponentially distributed on the right, matching the true exponential pdf.

rng = np.random.default_rng(0)
u = rng.random(50_000)
x = -np.log(u) / 1.5                 # Exponential(rate = 1.5) by inverse-CDF
print(f"sample mean: {x.mean():.4f}    theory 1/rate: {1/1.5:.4f}")
sample mean: 0.6660    theory 1/rate: 0.6667

In practice you call the library’s named sampler (rexp, randexp, rng.exponential), which uses inverse-CDF or a faster method internally. When there is no tidy inverse CDF, rejection sampling is the general fallback: propose from an easy distribution and accept a fraction of draws so the survivors follow the target — the same principle underlying MCMC.

# R: named samplers, seeded
set.seed(0)
waits <- rexp(50000, rate = 1.5)     # exponential waiting times
counts <- rpois(50000, lambda = 3)   # Poisson event counts
# Julia
using Random, Distributions
rng = MersenneTwister(0)
waits = rand(rng, Exponential(1 / 1.5), 50_000)

The Parallel-RNG Trap

This is the mistake that quietly corrupts big simulation studies — and mathematical biologists run big simulation studies. When you spread replicates across cores or cluster jobs, it is tempting to seed each worker with the same seed, or with seed = worker_id. Both are wrong: identical seeds make every worker produce the same “random” numbers (so your 1,000 replicates are really one replicate copied 1,000 times), and naively adjacent seeds can produce overlapping or correlated streams.

The right way is to derive independent sub-streams from one parent seed:

# Python: spawn independent child generators from one seed
parent = np.random.SeedSequence(2026)
children = parent.spawn(1000)                 # one per replicate / worker
rngs = [np.random.default_rng(c) for c in children]
# worker i uses rngs[i] -- guaranteed independent, and fully reproducible
# R: L'Ecuyer-CMRG streams are designed for parallel independence
RNGkind("L'Ecuyer-CMRG")
set.seed(2026)
# furrr / future automatically hand each worker an independent stream
# Julia: derive independent per-worker generators from one master seed
using Random
master = Xoshiro(2026)
rngs = [Xoshiro(rand(master, UInt64)) for _ in 1:1000]   # one per replicate

Do give every parallel worker an independent, reproducible stream from a single master seed. Don’t reuse one seed everywhere, and don’t let workers share one global generator.

Where Randomness Drives the Biology