Markov Chains

A Markov chain is a stochastic process that hops between states, where the probability of the next state depends only on the current state and not on the path that led there. This “memoryless” assumption is simple enough to analyze with linear algebra yet rich enough to model gene fixation, epidemic compartments, and the sampling engines behind modern statistics.

States, transitions, and the Markov property

Let the process take values in a finite set of states {1,2,,k}\{1, 2, \dots, k\} and let XnX_n be the state at step nn. The defining assumption is the Markov property, Pr(Xn+1=jXn=i,Xn1,,X0)=Pr(Xn+1=jXn=i),\Pr(X_{n+1} = j \mid X_n = i, X_{n-1}, \dots, X_0) = \Pr(X_{n+1} = j \mid X_n = i), so the past is irrelevant once the present is known. A time-homogeneous chain packs these one-step probabilities into a transition matrix PP with entries Pij=Pr(Xn+1=jXn=i).P_{ij} = \Pr(X_{n+1} = j \mid X_n = i). Each row is a probability distribution over destinations, so every row sums to 11 and all entries are non-negative — PP is a stochastic matrix.

Evolving the distribution

Represent the state of belief as a row vector π0\pi_0 giving the probability of starting in each state. One step forward is a vector–matrix product: the distribution after one step is π0P\pi_0 P, and by induction the distribution after nn steps is πn=π0Pn.\pi_n = \pi_0 P^n. The entry (Pn)ij(P^n)_{ij} is the probability of being in state jj exactly nn steps after starting in state ii.

Stationary distribution

A stationary (or invariant) distribution is a probability row vector π\pi that is unchanged by one more step: π=πP,iπi=1.\pi = \pi P, \qquad \sum_i \pi_i = 1. The equation π=πP\pi = \pi P says π\pi is a left eigenvector of PP with eigenvalue 11; every stochastic matrix has such an eigenvalue, so a stationary distribution always exists. If the chain is irreducible (every state reachable from every other) and aperiodic, the stationary distribution is unique and the chain converges to it from any starting point: πnπ\pi_n \to \pi as nn \to \infty, regardless of π0\pi_0.

Absorbing chains and hitting probabilities

Not every chain mixes toward a stationary distribution. A state ii is absorbing if Pii=1P_{ii} = 1: once entered, the chain never leaves. The Wright–Fisher model of genetic drift is exactly such a chain — the count of a neutral allele in a finite population performs a random walk with absorbing boundaries at 00 (loss) and 2N2N (fixation), so every population eventually fixes or loses the allele. For absorbing chains the interesting quantities are absorption probabilities (the chance of ending in each absorbing state) and expected hitting times, both obtained by solving linear systems built from the transient part of PP.

Worked example

Consider a two-state weather chain with states Sunny (11) and Rainy (22) and transition matrix P=[0.70.30.40.6].P = \begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix}. To find the stationary distribution π=(π1,π2)\pi = (\pi_1, \pi_2) we solve π=πP\pi = \pi P with π1+π2=1\pi_1 + \pi_2 = 1. Writing out the first component, π1=0.7π1+0.4π2    0.3π1=0.4π2    π2=34π1.\pi_1 = 0.7\,\pi_1 + 0.4\,\pi_2 \;\Longrightarrow\; 0.3\,\pi_1 = 0.4\,\pi_2 \;\Longrightarrow\; \pi_2 = \tfrac{3}{4}\pi_1. Imposing normalization, π1+34π1=1\pi_1 + \tfrac34\pi_1 = 1, so π1=47\pi_1 = \tfrac{4}{7} and π2=37\pi_2 = \tfrac{3}{7}: π=(47, 37)(0.5714, 0.4286).\pi = \left(\tfrac{4}{7},\ \tfrac{3}{7}\right) \approx (0.5714,\ 0.4286). In the long run the chain is Sunny about 57%57\% of days regardless of today’s weather.

Simulation

Simulate a long trajectory and confirm the empirical frequency of each state converges to π\pi.

R

set.seed(42)
P <- matrix(c(0.7, 0.3,
              0.4, 0.6), nrow = 2, byrow = TRUE)

n <- 100000
x <- integer(n); x[1] <- 1
for (t in 2:n) x[t] <- sample(1:2, 1, prob = P[x[t - 1], ])

table(x) / n            # ~ 0.571, 0.429

# stationary vector as left eigenvector with eigenvalue 1
ev <- eigen(t(P))$vectors[, 1]
$Re(ev / sum(ev))        # ~ 0.5714, 0.4286

Python

import numpy as np
rng = np.random.default_rng(42)

P = np.array([[0.7, 0.3],
              [0.4, 0.6]])

n = 100_000
x = np.empty(n, dtype=int); x[0] = 0
for t in range(1, n):
    x[t] = rng.choice(2, p=P[x[t - 1]])

print(np.bincount(x) / n)          # ~ [0.571, 0.429]

# stationary vector: left eigenvector for eigenvalue 1
vals, vecs = np.linalg.eig(P.T)
pi = np.real(vecs[:, np.argmin(abs(vals - 1))])
print(pi / pi.sum())               # ~ [0.5714, 0.4286]
[0.57124 0.42876]
[0.57142857 0.42857143]

Julia

using LinearAlgebra, Random, Statistics
Random.seed!(42)

P = [0.7 0.3;
     0.4 0.6]

n = 100_000
x = Vector{Int}(undef, n); x[1] = 1
for t in 2:n
    r = rand()
    x[t] = r < P[x[t-1], 1] ? 1 : 2
end

[count(==(s), x) / n for s in 1:2]     # ~ [0.571, 0.429]

# stationary vector: left eigenvector for eigenvalue 1
F = eigen(collect(P'))
pi = real.(F.vectors[:, argmin(abs.(F.values .- 1))])
pi ./ sum(pi)                          # ~ [0.5714, 0.4286]

Why it matters

Markov chains are one of the most reusable modelling tools in quantitative biology and statistics. The same π=πP\pi = \pi P machinery gives the long-run behaviour of a weather model, the fixation probabilities of a drifting allele, and the equilibrium of a compartmental epidemic; run in reverse, constructing a chain whose stationary distribution is a target posterior is exactly the idea behind Markov chain Monte Carlo.