Graphing Data

A good graph is often the whole analysis: the right picture reveals structure — a skew, an outlier, a turning epidemic — that tables and summary statistics hide. This page is about choosing the right chart and drawing it honestly, with the grammar-of-graphics tools in R, Python, and Julia.

Four workhorse chart types: a distribution (histogram), a relationship (scatter with trend), a trend over time (the epidemic curve), and a group comparison (boxplot).

Match the chart to the question

The chart should follow from what you are asking, not from habit.

Principles of honest graphics

The grammar of graphics

The most productive plotting tools describe a chart as layers: a dataset, a mapping from variables to visual channels (aesthetics — position, color, size), and geometric marks (geoms). Learn one grammar and the others feel familiar.

In code

The same epidemic curve in each ecosystem.

R — ggplot2

library(ggplot2)
df <- data.frame(day = 0:59,
                 cases = round(300 * exp(-((0:59 - 25) / 10)^2)))
ggplot(df, aes(day, cases)) +
  geom_col(fill = "#2f6f9f") +
  labs(title = "Epidemic curve", x = "day", y = "incident cases") +
  theme_minimal()

Python — matplotlib

import numpy as np
import matplotlib.pyplot as plt

day = np.arange(60)
cases = 300 * np.exp(-((day - 25) / 10) ** 2)
fig, ax = plt.subplots()
ax.bar(day, cases, color="#2f6f9f")
ax.set(title="Epidemic curve", xlabel="day", ylabel="incident cases")
fig.savefig("epi_curve.svg")     # or plt.show()

Julia — Plots.jl

using Plots
day = 0:59
cases = @. 300 * exp(-((day - 25) / 10)^2)
bar(day, cases, xlabel = "day", ylabel = "incident cases",
    title = "Epidemic curve", legend = false)

Why it matters

Graphing is where analysis meets communication: it is the fastest way to explore data (spotting the outlier, the skew, the second wave) and the most honest way to present a result. The figures throughout this site — from the logistic curve to the Simpson’s-paradox scatter — exist because a picture makes the idea land.