Diversity Indices

A diversity index compresses a whole community’s species composition into a single number that captures both how many species are present and how evenly the individuals are spread among them. This matters everywhere from gut microbiome surveys to host-parasite systems, because “which community is more diverse?” is really a question about the shape of the relative abundances pip_i.

Rank-abundance curves: the even community is more diverse than the uneven one.

Relative abundances

Start from counts of each species and convert them to relative abundances pip_i, the fraction of all individuals belonging to species ii. By construction they are non-negative and sum to one, ipi=1,\sum_i p_i = 1, so a pip_i is exactly the probability that a randomly drawn individual belongs to species ii. Every index below is a different summary of this abundance vector, which comes from the community’s underlying species-abundance distribution.

Species richness

The simplest measure is species richness SS, the count of species with pi>0p_i > 0. Richness treats a species represented by a single individual exactly like a dominant one, so it ignores evenness entirely and is highly sensitive to sampling effort.

Shannon index

The Shannon index borrows from information theory and measures the uncertainty in the species identity of a random individual: H=ipilnpi.H = -\sum_i p_i \ln p_i . It is largest when all species are equally common and zero when one species holds everything. Because it uses the logarithm, HH is on a log scale; exponentiating it, eH=exp ⁣(ipilnpi),e^{H} = \exp\!\left(-\sum_i p_i \ln p_i\right), returns an effective number of species — the number of equally-abundant species that would give the same HH. The Shannon index is also the expected value of lnpi-\ln p_i over the community.

Simpson’s index

Simpson’s index is the probability that two individuals drawn at random (with replacement) belong to the same species: D=ipi2.D = \sum_i p_i^2 . Large DD means one or a few species dominate. Two common rescalings make it increase with diversity:

Pielou’s evenness

To separate evenness from richness, divide the Shannon index by its maximum possible value lnS\ln S (attained when all species are equally abundant): J=HlnS.J = \frac{H}{\ln S} . Pielou’s evenness JJ ranges from 00 (one species dominates) to 11 (perfectly even), and being a ratio it is comparable across communities with different SS.

Hill numbers unify them

The Hill numbers express all of these as one family indexed by an order qq that tunes how much weight common species receive: qD=(ipiq)1/(1q).{}^{q}D = \left(\sum_i p_i^{q}\right)^{1/(1-q)} . Each is an effective number of species, measured in the same units as richness, which makes them directly comparable.

As qq rises, rare species contribute less, so qD{}^{q}D decreases: 0D1D2D{}^{0}D \ge {}^{1}D \ge {}^{2}D. Plotting qD{}^{q}D against qq gives a diversity profile that summarizes a community at every weighting at once.

Worked example

Take a community of four species with counts 40,30,20,1040, 30, 20, 10 (total 100100), so p=(0.4, 0.3, 0.2, 0.1).p = (0.4,\ 0.3,\ 0.2,\ 0.1). Richness is S=4S = 4.

Shannon index: H=(0.4ln0.4+0.3ln0.3+0.2ln0.2+0.1ln0.1)1.2799.H = -\big(0.4\ln 0.4 + 0.3\ln 0.3 + 0.2\ln 0.2 + 0.1\ln 0.1\big) \approx 1.2799 . So the effective number of species is eH3.596e^{H} \approx 3.596.

Simpson’s index: D=0.42+0.32+0.22+0.12=0.16+0.09+0.04+0.01=0.30.D = 0.4^2 + 0.3^2 + 0.2^2 + 0.1^2 = 0.16 + 0.09 + 0.04 + 0.01 = 0.30 . Hence Gini–Simpson 1D=0.701 - D = 0.70 and inverse Simpson 1/D3.3331/D \approx 3.333.

Evenness: J=HlnS=1.2799ln4=1.27991.38630.923.J = \frac{H}{\ln S} = \frac{1.2799}{\ln 4} = \frac{1.2799}{1.3863} \approx 0.923 .

Collecting the Hill numbers: 0D=4{}^{0}D = 4, 1D=eH3.60{}^{1}D = e^{H} \approx 3.60, 2D=1/D3.33{}^{2}D = 1/D \approx 3.33 — a gentle decline, reflecting a fairly even community.

In code

R

p <- c(40, 30, 20, 10); p <- p / sum(p)     # relative abundances

# vegan
library(vegan)
diversity(p, index = "shannon")             # 1.279854
diversity(p, index = "simpson")             # 0.70  (Gini-Simpson, = 1 - D)
diversity(p, index = "invsimpson")          # 3.333333

# manual
S <- sum(p > 0)                             # richness = 4
H <- -sum(p * log(p))                       # 1.279854
J <- H / log(S)                             # 0.9231
D <- sum(p^2)                               # 0.30
hill <- function(p, q) if (q == 1) exp(-sum(p*log(p)))
                       else sum(p^q)^(1/(1-q))
c(hill(p,0), hill(p,1), hill(p,2))          # 4.000 3.596 3.333

Python

import numpy as np
p = np.array([40, 30, 20, 10], float); p /= p.sum()

S = np.sum(p > 0)                    # 4
H = -np.sum(p * np.log(p))          # 1.2798542
J = H / np.log(S)                   # 0.92314
D = np.sum(p**2)                    # 0.30

def hill(p, q):
    return np.exp(-np.sum(p*np.log(p))) if q == 1 else np.sum(p**q)**(1/(1-q))

print(S, H, J, D)                   # 4 1.27985 0.92314 0.30
print([hill(p, q) for q in (0, 1, 2)])  # [4.0, 3.5955, 3.3333]
4 1.2798542258336674 0.9232196723355077 0.30000000000000004
[np.float64(4.0), np.float64(3.5961154666243216), np.float64(3.333333333333333)]

Julia

p = [40, 30, 20, 10] ./ 100          # relative abundances

S = count(>(0), p)                   # 4
H = -sum(p .* log.(p))               # 1.2798542
J = H / log(S)                       # 0.92314
D = sum(p .^ 2)                      # 0.30

hill(p, q) = q == 1 ? exp(-sum(p .* log.(p))) : sum(p .^ q)^(1/(1 - q))
[hill(p, q) for q in (0, 1, 2)]      # [4.0, 3.5955, 3.3333]

Why it matters

Because a single index can hide as much as it reveals, choosing an index is really choosing how much to care about rare species. Reporting richness, a Shannon- or Simpson-based number, and an evenness measure — or better, a whole Hill-number profile — lets a microbiome or community study compare samples on a common effective-species scale and state clearly whether a treatment changed the number of species, their evenness, or both.