Diversity Indices
A diversity index compresses a whole community’s species composition into a single number that captures both how many species are present and how evenly the individuals are spread among them. This matters everywhere from gut microbiome surveys to host-parasite systems, because “which community is more diverse?” is really a question about the shape of the relative abundances .
Relative abundances
Start from counts of each species and convert them to relative abundances , the fraction of all individuals belonging to species . By construction they are non-negative and sum to one, so a is exactly the probability that a randomly drawn individual belongs to species . Every index below is a different summary of this abundance vector, which comes from the community’s underlying species-abundance distribution.
Species richness
The simplest measure is species richness , the count of species with . Richness treats a species represented by a single individual exactly like a dominant one, so it ignores evenness entirely and is highly sensitive to sampling effort.
Shannon index
The Shannon index borrows from information theory and measures the uncertainty in the species identity of a random individual: It is largest when all species are equally common and zero when one species holds everything. Because it uses the logarithm, is on a log scale; exponentiating it, returns an effective number of species — the number of equally-abundant species that would give the same . The Shannon index is also the expected value of over the community.
Simpson’s index
Simpson’s index is the probability that two individuals drawn at random (with replacement) belong to the same species: Large means one or a few species dominate. Two common rescalings make it increase with diversity:
- Gini–Simpson index , the probability that two random individuals differ in species.
- Inverse Simpson index , again an effective number of species, but one that discounts rare species relative to .
Pielou’s evenness
To separate evenness from richness, divide the Shannon index by its maximum possible value (attained when all species are equally abundant): Pielou’s evenness ranges from (one species dominates) to (perfectly even), and being a ratio it is comparable across communities with different .
Hill numbers unify them
The Hill numbers express all of these as one family indexed by an order that tunes how much weight common species receive: Each is an effective number of species, measured in the same units as richness, which makes them directly comparable.
- : , plain richness — every species counts equally regardless of abundance.
- : the limit is , the exponential of Shannon — species weighted in proportion to abundance.
- : , the inverse Simpson number — common species weighted most.
As rises, rare species contribute less, so decreases: . Plotting against gives a diversity profile that summarizes a community at every weighting at once.
Worked example
Take a community of four species with counts (total ), so Richness is .
Shannon index: So the effective number of species is .
Simpson’s index: Hence Gini–Simpson and inverse Simpson .
Evenness:
Collecting the Hill numbers: , , — a gentle decline, reflecting a fairly even community.
In code
R
p <- c(40, 30, 20, 10); p <- p / sum(p) # relative abundances
# vegan
library(vegan)
diversity(p, index = "shannon") # 1.279854
diversity(p, index = "simpson") # 0.70 (Gini-Simpson, = 1 - D)
diversity(p, index = "invsimpson") # 3.333333
# manual
S <- sum(p > 0) # richness = 4
H <- -sum(p * log(p)) # 1.279854
J <- H / log(S) # 0.9231
D <- sum(p^2) # 0.30
hill <- function(p, q) if (q == 1) exp(-sum(p*log(p)))
else sum(p^q)^(1/(1-q))
c(hill(p,0), hill(p,1), hill(p,2)) # 4.000 3.596 3.333
Python
import numpy as np
p = np.array([40, 30, 20, 10], float); p /= p.sum()
S = np.sum(p > 0) # 4
H = -np.sum(p * np.log(p)) # 1.2798542
J = H / np.log(S) # 0.92314
D = np.sum(p**2) # 0.30
def hill(p, q):
return np.exp(-np.sum(p*np.log(p))) if q == 1 else np.sum(p**q)**(1/(1-q))
print(S, H, J, D) # 4 1.27985 0.92314 0.30
print([hill(p, q) for q in (0, 1, 2)]) # [4.0, 3.5955, 3.3333]
4 1.2798542258336674 0.9232196723355077 0.30000000000000004
[np.float64(4.0), np.float64(3.5961154666243216), np.float64(3.333333333333333)]
Julia
p = [40, 30, 20, 10] ./ 100 # relative abundances
S = count(>(0), p) # 4
H = -sum(p .* log.(p)) # 1.2798542
J = H / log(S) # 0.92314
D = sum(p .^ 2) # 0.30
hill(p, q) = q == 1 ? exp(-sum(p .* log.(p))) : sum(p .^ q)^(1/(1 - q))
[hill(p, q) for q in (0, 1, 2)] # [4.0, 3.5955, 3.3333]
Why it matters
Because a single index can hide as much as it reveals, choosing an index is really choosing how much to care about rare species. Reporting richness, a Shannon- or Simpson-based number, and an evenness measure — or better, a whole Hill-number profile — lets a microbiome or community study compare samples on a common effective-species scale and state clearly whether a treatment changed the number of species, their evenness, or both.