Heritability and Variance Components
Heritability answers a deceptively simple question: how much of the variation in a trait across a population is due to genetic differences rather than environment? It is a variance ratio, not a statement about any individual, and it anchors the study of disease genetics, breeding, and polygenic scores.
Decomposing phenotypic variance
For a trait measured across a population, the total or phenotypic variance splits into genetic and environmental parts:
where is the variance attributable to genotype differences and to environment (this simple form assumes genotype and environment are uncorrelated and non-interacting). The genetic variance itself decomposes further:
into additive effects (the average per-allele contributions that parents pass on), dominance (interactions between alleles at one locus), and epistatic (interactions across loci).
Two heritabilities
Broad-sense heritability is the fraction of phenotypic variance explained by all genetic differences:
Narrow-sense heritability uses only the additive part:
The narrow-sense is the more useful quantity in most of quantitative genetics because additive effects are what respond to selection and what a linear predictor built from allele dosages can capture. Both are ratios between and , and both are properties of a population in an environment, not fixed constants of a trait.
Estimating heritability
Twin studies and Falconer’s formula
Identical (monozygotic, MZ) twins share essentially all their DNA; fraternal (dizygotic, DZ) twins share on average half their segregating variants. If MZ pairs are more similar than DZ pairs, genetics is contributing. Comparing the trait correlation within MZ pairs, , to that within DZ pairs, , Falconer’s formula estimates
The factor of two reflects that DZ twins share half the additive variance, so doubling the gap in correlations recovers the additive fraction. Broad-sense can be approximated as as well under simple assumptions, and estimates the shared-environment share.
SNP-heritability from genome-wide data
Modern estimates use unrelated individuals and measured genotypes directly. Methods such as GREML fit a linear mixed model in which the phenotype has a random genetic effect whose covariance between individuals is their genetic relatedness matrix (GRM), computed from genome-wide SNPs:
with the GRM. The estimated is SNP-heritability, the additive variance tagged by common SNPs. It is typically lower than twin-based , and the gap (“missing heritability”) reflects rare variants and imperfect tagging.
Worked example
A twin study of a quantitative trait reports and . Falconer’s formula gives
so about 60% of the trait variance is additively genetic. The shared-environment estimate is , and the remaining is unique environment plus measurement error.
In code
R
r_mz <- 0.80; r_dz <- 0.50
h2 <- 2 * (r_mz - r_dz) # 0.60 narrow-sense heritability
c2 <- 2 * r_dz - r_mz # 0.20 shared environment
e2 <- 1 - r_mz # 0.20 unique environment
# Variance-component view: V_P = V_G + V_E
V_G <- 6; V_E <- 4; V_P <- V_G + V_E
H2 <- V_G / V_P # 0.60 broad-sense
# SNP-heritability in practice: lme4/GCTA-style mixed model with a GRM
Python
r_mz, r_dz = 0.80, 0.50
h2 = 2 * (r_mz - r_dz) # 0.60
c2 = 2 * r_dz - r_mz # 0.20
e2 = 1 - r_mz # 0.20
V_G, V_E = 6.0, 4.0
V_P = V_G + V_E
H2 = V_G / V_P # 0.60 broad-sense
# SNP-heritability: statsmodels MixedLM / GREML-style GRM model
Julia
r_mz, r_dz = 0.80, 0.50
h2 = 2 * (r_mz - r_dz) # 0.60
c2 = 2 * r_dz - r_mz # 0.20
e2 = 1 - r_mz # 0.20
V_G, V_E = 6.0, 4.0
V_P = V_G + V_E
H2 = V_G / V_P # 0.60 broad-sense
# SNP-heritability: MixedModels.jl with a genetic relatedness matrix
Why it matters
Heritability sets a ceiling on how well genetics can ever predict a trait, framing expectations for GWAS discovery and polygenic-score accuracy. It is also one of the most misunderstood numbers in science: a high does not mean a trait is unchangeable, nor does it explain differences between groups or within an individual — it describes the sources of variance in one population under its particular range of environments.