Floating-Point Arithmetic & Numerical Stability
A computer does not store real numbers. It stores floating-point approximations — numbers with a fixed budget of binary digits — and every calculation nudges the result by a tiny rounding error. Most of the time you never notice. But in the calculations mathematical biologists do all day — multiplying probabilities, summing likelihoods, integrating differential equations — those tiny errors can compound until a “correct” model returns a wrong number, an infinity, or nothing at all.
This page is about recognizing those failure modes and the handful of tricks that avoid them. The most important one — working in log space, and its refinement the log-sum-exp trick — is worth its weight in gold for anyone who computes a likelihood.
How Computers Store Numbers
A standard double-precision float (float64, the default in R, Python, and Julia) carries about 15–17 significant decimal digits and can represent magnitudes from roughly 1e-308 to 1e308.
Two consequences follow immediately:
- Precision is finite. Numbers that need more than ~16 digits get rounded.
0.1has no exact binary representation, just as1/3has no exact decimal one. - Range is finite. A value smaller than ~
1e-308underflows to exactly0; a value larger than ~1e308overflows toInf.
The classic demonstration:
print(0.1 + 0.2)
print(0.1 + 0.2 == 0.3)
0.30000000000000004
False
The sum is off in the 17th digit, and the equality test fails.
Do not test floating-point numbers for exact equality. Compare within a tolerance instead.
# R: never `if (x == y)` for computed doubles
isTRUE(all.equal(0.1 + 0.2, 0.3)) # TRUE, uses a tolerance
abs((0.1 + 0.2) - 0.3) < 1e-9 # TRUE, explicit tolerance
# Julia: isapprox (the ≈ operator)
(0.1 + 0.2) ≈ 0.3 # true
Catastrophic Cancellation
Rounding error is usually harmless because it is relative — about 16 digits of accuracy no matter the scale. It becomes dangerous when you subtract two nearly equal numbers: the leading digits cancel, and what remains is mostly rounding noise. This is catastrophic cancellation, and the textbook victim is the one-pass (“computational”) formula for a variance, , which subtracts two large, nearly equal sums.
import numpy as np
x = np.array([1e8 + 1.0, 1e8 + 2.0, 1e8 + 3.0])
# one-pass formula: subtracts two huge, nearly-equal numbers
one_pass = (np.sum(x**2) - np.sum(x)**2 / len(x)) / (len(x) - 1)
# two-pass formula: subtract the mean first, then square
two_pass = np.sum((x - x.mean())**2) / (len(x) - 1)
print("one-pass:", one_pass) # wrong: collapses to 0 (true value is 1.0)
print("two-pass:", two_pass) # correct: 1.0
one-pass: 0.0
two-pass: 1.0
The two-pass version subtracts the mean first, while the numbers are still close together, so no catastrophic cancellation occurs.
The lesson generalizes: rearrange a calculation so you never subtract two big numbers that are almost equal. This is exactly why you should use a library’s var(), sd(), or numpy.var rather than coding the “shortcut” formula yourself.
Work in Log Space
Here is the failure that stops MCMC chains and phylogenetics runs cold. A likelihood is a product of probabilities — one factor per site, read, observation, or individual:
Each is below 1, so the product shrinks geometrically.
After only a few hundred factors it slips below 1e-308 and underflows to exactly 0 — and then log(0) = -Inf poisons every downstream calculation (see the figure above).
The fix is to never form the product at all. Take the logarithm of both sides and the product becomes a sum:
A sum of moderate negative numbers is perfectly well-behaved no matter how large gets.
p = np.full(1000, 1e-3) # 1000 probabilities, each 0.001
print("product of probabilities:", np.prod(p)) # underflows to 0.0
print("sum of log-probabilities:", np.sum(np.log(p))) # finite, usable
product of probabilities: 0.0
sum of log-probabilities: -6907.755278982137
Rule of thumb: compute and store log-probabilities, not probabilities. Multiply probabilities → add log-probabilities. Divide → subtract. This single habit is why every serious likelihood, Bayesian, and phylogenetic library works in log space internally.
The Log-Sum-Exp Trick
Working in logs is easy until you have to add probabilities that are stored as logs — and in biology you have to do this constantly, because summing over unobserved possibilities is addition in probability space.
You have and ; you want .
The naive route — exponentiate, add, take the log — throws away everything you gained, because underflows right back to 0.
The log-sum-exp trick keeps you safe. To compute , factor out the largest term :
Now the biggest exponent is and everything else is between 0 and 1 — all comfortably representable. You get the exact same answer with no underflow.
def logsumexp(a):
m = np.max(a)
return m + np.log(np.sum(np.exp(a - m)))
logw = np.array([-1000.0, -1001.5, -1003.0]) # three log-probabilities
naive = np.log(np.sum(np.exp(logw))) # exp underflows -> log(0)
print("naive: ", naive) # -inf : wrong
print("logsumexp:", logsumexp(logw)) # correct, finite
naive: -inf
logsumexp: -999.7586887033428
You rarely need to write it yourself — every ecosystem ships a tested, vectorized version:
# R
matrixStats::logSumExp(c(-1000, -1001.5, -1003))
# Julia
using LogExpFunctions
logsumexp([-1000.0, -1001.5, -1003.0])
# Python
from scipy.special import logsumexp
logsumexp([-1000.0, -1001.5, -1003.0])
Where This Shows Up in Biology
Once you recognize “log of a sum of exponentials,” you see it everywhere a model marginalizes over hidden possibilities:
- Phylogenetics — Felsenstein’s pruning algorithm. The likelihood of a tree sums over all unobserved ancestral states at every internal node. Every credible phylogenetics engine (RAxML, BEAST, MrBayes) does this sum in log space with log-sum-exp.
- Hidden Markov Models — the forward algorithm. Gene finding, CpG-island detection, ion-channel gating, and animal-behavior state models all sum probability over exponentially many hidden state paths; the forward recursion is log-sum-exp at each step.
- Bayesian inference. The log-posterior is
log-likelihood + log-prior; the marginal likelihood (model evidence) and importance-sampling weights are normalized with log-sum-exp. - Mixture models and model averaging. The density — population-structure models, Gaussian mixtures for flow-cytometry gating — is a log-sum-exp over components.
- Softmax / multinomial models. Multinomial logistic regression and multi-class species-distribution models normalize scores with the softmax, whose denominator is a log-sum-exp.
- Taxonomic read classification. Naive-Bayes assignment of sequencing reads sums log-probabilities across features and normalizes with log-sum-exp.
A Short Checklist
- Store log-probabilities. Add them instead of multiplying probabilities.
- Sum in log space with log-sum-exp, subtracting the max first — or just call your library’s
logSumExp. - Never compare computed floats with
==; use a tolerance (all.equal,isapprox,np.isclose). - Don’t subtract two nearly-equal large numbers; rearrange the formula, and prefer built-in
var/sd. - Watch for
Inf,-Inf, andNaNappearing mid-computation — they are usually the fingerprint of an overflow, an underflow, or a0/0, not a modeling mistake. - Reach for stable library functions (
log1p,expm1,logsumexp,softmax) instead of composinglog,exp, and subtraction by hand.
Related
- Big-O Notation & Computational Complexity — the other half of “will this compute correctly and fast enough?”
- A Simulation Toolkit — where likelihoods and log-probabilities get evaluated millions of times
- Reproducibility — recording tolerances and seeds so results are stable
- MCMC and Markov Chains — samplers that live or die by log-space arithmetic
- Computer Basics for Scientists — how data is represented in the machine
- Programming & Computing