Random Variables
A random variable turns messy real-world outcomes into numbers we can add, average, and model. It is the bridge between raw probability and the distributions used throughout statistics and epidemiology.
Definition
A random variable is a function that maps each outcome in the sample space to a real number:
For a coin flip, and we might set , . The randomness lives in which outcome occurs; just records it numerically.
Discrete vs. continuous
- A discrete random variable takes values in a countable set (0, 1, 2, …): counts of cases, number of successes.
- A continuous random variable takes values in an interval of : heights, waiting times, concentrations.
pmf vs. pdf
For a discrete RV, the probability mass function (pmf) gives the probability of each value:
For a continuous RV, single points have probability zero, so we use a probability density function (pdf) . Probability is area under the density:
The density integrating to 1 is the continuous analogue of the pmf summing to 1.
The cumulative distribution function
Both types share the cumulative distribution function (CDF):
The CDF is well-behaved for every random variable and has three defining properties:
- Non-decreasing: if then .
- Limits: and .
- Right-continuous.
For a continuous RV, the pdf is the derivative of the CDF, .
Worked example: a discrete RV
Let — the number of heads in three fair flips. The pmf is :
These sum to 1. The CDF steps upward: .
Worked example: a continuous RV
Let with pdf for . Its CDF is
So , and rises smoothly from 0 to 1.
Computing it
Evaluate pmf/pdf and CDF directly with built-in distribution functions.
R
# Discrete: Binomial(3, 0.5)
dbinom(1, size = 3, prob = 0.5) # pmf P(X=1) = 0.375
pbinom(1, size = 3, prob = 0.5) # cdf P(X<=1) = 0.5
# Continuous: Normal(0, 1)
dnorm(0) # pdf at 0 = 0.3989
pnorm(1.96) # cdf P(X<=1.96) = 0.975
Python
from scipy import stats
# Discrete: Binomial(3, 0.5)
print(stats.binom.pmf(1, n=3, p=0.5)) # 0.375
print(stats.binom.cdf(1, n=3, p=0.5)) # 0.5
# Continuous: Normal(0, 1)
print(stats.norm.pdf(0)) # 0.3989
print(stats.norm.cdf(1.96)) # 0.975
0.3750000000000001
0.5
0.3989422804014327
0.9750021048517795
Julia
using Distributions
# Discrete: Binomial(3, 0.5)
pdf(Binomial(3, 0.5), 1) # pmf = 0.375
cdf(Binomial(3, 0.5), 1) # 0.5
# Continuous: Normal(0, 1)
pdf(Normal(0, 1), 0) # 0.3989
cdf(Normal(0, 1), 1.96) # 0.975
Why it matters for statistics
Random variables are the objects statistics is about: an estimator is a random variable, a test statistic is a random variable, and data are realizations of random variables. The pmf/pdf and CDF are the two universal descriptions of their behavior — the CDF in particular underlies quantiles, p-values, and the monotonic transformations used in simulation and maximum likelihood.