Genetic Drift and the Wright–Fisher Model
Genetic drift is the random change in allele frequency that happens simply because a finite population is a finite sample of gametes each generation. It matters wherever population sizes are small — bottlenecked pathogen lineages, isolated host demes, founder events after a colonization — because there chance, not selection, can drive an allele all the way to fixation or loss.
The Wright–Fisher model
The Wright–Fisher model is the simplest idealization of drift. A population of diploid individuals carries gene copies at a locus; each generation is formed by drawing alleles with replacement from the current pool. If the current frequency of allele is , the number of copies in the next generation is a binomial draw: This is exactly the binomial distribution applied to allele sampling, and it makes the behavior of drift easy to characterize.
Mean and variance
Because the sampling is unbiased, the expected next-generation frequency equals the current one: So drift has no preferred direction — on average nothing changes. What changes is the spread. The variance of the new frequency is The in the denominator is the whole story: in a large population the variance is tiny and frequencies barely move, while in a small population the same formula produces large random jumps. Here is treated as a random variable whose expectation is preserved but whose uncertainty accumulates.
Consequences of drift
Run the process long enough and three robust facts emerge.
- Absorption: with no mutation, every allele eventually reaches fixation () or loss (); these are the only stable states because there.
- Fixation probability: because every generation, the probability that allele is the one ultimately fixed equals its current frequency, .
- Loss of heterozygosity: the expected heterozygosity decays geometrically, losing a fraction per generation: So genetic variation erodes at a rate set entirely by population size.
Effective population size
Real populations violate the Wright–Fisher assumptions — unequal sex ratios, variable offspring number, overlapping generations, fluctuating size. The effective population size is the size of an ideal Wright–Fisher population that would drift (lose heterozygosity) at the same rate as the real one. It is almost always smaller than the census size, and it is , not the head count, that governs the formulas above and the coalescent timescale.
Worked example
Start an allele at frequency in a population of diploids, so gene copies. The next generation’s frequency has expectation and variance a standard deviation of about . So a single generation typically shifts the frequency by roughly percentage points either way. The probability that this allele is eventually fixed is just its current frequency, , and the expected heterozygosity decays by a factor per generation — about a loss of variation each generation in this small population.
Simulation
R
set.seed(1)
drift <- function(N, p0, gens) {
p <- numeric(gens + 1); p[1] <- p0
for (t in 1:gens) p[t + 1] <- rbinom(1, 2 * N, p[t]) / (2 * N)
p
}
reps <- replicate(1000, drift(N = 25, p0 = 0.5, gens = 100)[101])
mean(reps == 1) # ~ 0.5 : fraction fixed matches starting frequency
mean(reps == 0) # ~ 0.5 : fraction lost
Python
import numpy as np
rng = np.random.default_rng(1)
def drift(N, p0, gens):
p = p0
for _ in range(gens):
p = rng.binomial(2 * N, p) / (2 * N)
return p
reps = np.array([drift(25, 0.5, 100) for _ in range(1000)])
print((reps == 1).mean()) # ~ 0.5 : fraction fixed ~ starting frequency
print((reps == 0).mean()) # ~ 0.5 : fraction lost
0.414
0.389
Julia
using Random, Distributions
Random.seed!(1)
function drift(N, p0, gens)
p = p0
for _ in 1:gens
p = rand(Binomial(2N, p)) / (2N)
end
return p
end
reps = [drift(25, 0.5, 100) for _ in 1:1000]
println(mean(reps .== 1)) # ~ 0.5 : fraction fixed
println(mean(reps .== 0)) # ~ 0.5 : fraction lost
Why it matters
Genetic drift is the neutral null against which selection is judged: any explanation invoking adaptation must beat what chance sampling alone would produce. Because its strength scales as , drift dominates in the small, fluctuating populations typical of pathogens and founder events, shaping standing variation, the fate of new mutations, and the genealogies that phylodynamic inference reads backward in time.