Networks and Graphs

A network encodes who is connected to whom — which hosts contact each other, which species eat which, which neurons wire together. Turning that web of relationships into a matrix lets us bring the full machinery of linear algebra to bear on ecology and disease transmission.

Nodes, edges, and the adjacency matrix

A network (or graph) is a set of nodes (vertices) joined by edges (links). Write NN for the number of nodes and EE for the number of edges. Edges may be undirected (a symmetric relationship, like a shared handshake) or directed (an ordered relationship, like “ii infects jj”), and each edge may carry a weight recording contact frequency, flow, or interaction strength.

The structure is captured compactly by the adjacency matrix AA, an N×NN\times N array whose entries record which pairs are connected: Aij={1if nodes i and j are joined by an edge,0otherwise.A_{ij} = \begin{cases} 1 & \text{if nodes } i \text{ and } j \text{ are joined by an edge},\\ 0 & \text{otherwise.}\end{cases} For an undirected network AA is symmetric, Aij=AjiA_{ij}=A_{ji}, whereas a directed network generally has AijAjiA_{ij}\neq A_{ji}. Writing a network this way — see matrix notation — means that graph questions become matrix operations: the number of length-22 paths from ii to jj is the (i,j)(i,j) entry of A2A^2, and the eigenvalues of AA govern spreading dynamics, with the largest eigenvalue setting the epidemic threshold on a contact network.

Degree and degree distribution

The degree kik_i of node ii is its number of connections, which for an undirected network is simply a row sum of the adjacency matrix: ki=j=1NAij.k_i = \sum_{j=1}^{N} A_{ij}. In a directed network we distinguish the in-degree kiin=jAjik_i^{\text{in}}=\sum_j A_{ji} (edges pointing in) from the out-degree kiout=jAijk_i^{\text{out}}=\sum_j A_{ij} (edges pointing out). The degree distribution P(k)P(k) is the fraction of nodes with degree kk, a probability distribution over the graph; its shape distinguishes homogeneous networks from heavy-tailed ones dominated by hubs. The mean degree for an undirected network is k=2EN,\langle k\rangle = \frac{2E}{N}, where the factor of 22 appears because each of the EE edges contributes to the degree of both of its endpoints.

Paths, components, and clustering

A path is a sequence of edges leading from one node to another, and the shortest-path length between two nodes is the fewest edges needed to connect them. The largest shortest-path length over all pairs is the diameter of the network. A connected component is a maximal set of nodes all mutually reachable by paths; when one component contains a finite fraction of all nodes as NN grows, it is called the giant component, and its emergence marks the point at which a pathogen can reach much of the population.

Local cohesion is measured by the clustering coefficient (transitivity), the probability that two neighbours of a node are themselves neighbours — “the friend of my friend is also my friend.” Overall sparsity is summarised by the density (connectance), the fraction of possible edges that are actually present: density=E(N2)=2EN(N1).\text{density} = \frac{E}{\binom{N}{2}} = \frac{2E}{N(N-1)}.

Worked example: a 5-node graph

Take an undirected network on nodes {1,2,3,4,5}\{1,2,3,4,5\} with edges {1 ⁣ ⁣2, 1 ⁣ ⁣3, 2 ⁣ ⁣3, 3 ⁣ ⁣4, 4 ⁣ ⁣5}\{1\!-\!2,\ 1\!-\!3,\ 2\!-\!3,\ 3\!-\!4,\ 4\!-\!5\}, so E=5E=5. Its adjacency matrix is A=[0110010100110100010100010].A = \begin{bmatrix} 0 & 1 & 1 & 0 & 0\\ 1 & 0 & 1 & 0 & 0\\ 1 & 1 & 0 & 1 & 0\\ 0 & 0 & 1 & 0 & 1\\ 0 & 0 & 0 & 1 & 0 \end{bmatrix}. Summing each row gives the degrees k1=2, k2=2, k3=3, k4=2, k5=1k_1=2,\ k_2=2,\ k_3=3,\ k_4=2,\ k_5=1, which sum to 10=2E10=2E as they must. The mean degree is k=2E/N=10/5=2\langle k\rangle = 2E/N = 10/5 = 2. The density is E/(N2)=5/(52)=5/10=0.5E/\binom{N}{2} = 5/\binom{5}{2} = 5/10 = 0.5, so half of the ten possible edges are present. Node 33 is the most connected, and the whole graph is a single connected component (you can walk from node 55 to node 11), so its giant component contains all five nodes.

In code

We build the example graph and read off its degrees, density, and components.

R

library(igraph)

g <- graph_from_literal(1-2, 1-3, 2-3, 3-4, 4-5)
degree(g)            # 1:2  2:2  3:3  4:2  5:1
mean(degree(g))      # 2
edge_density(g)      # 0.5
components(g)$no     # 1  (one connected component)
$transitivity(g)      # global clustering coefficient

Python

import networkx as nx

g = nx.Graph([(1, 2), (1, 3), (2, 3), (3, 4), (4, 5)])
print(dict(g.degree()))                       # {1:2, 2:2, 3:3, 4:2, 5:1}
print(2 * g.number_of_edges() / g.number_of_nodes())  # 2.0
print(nx.density(g))                          # 0.5
print(nx.number_connected_components(g))      # 1
print(nx.transitivity(g))                     # global clustering
{1: 2, 2: 2, 3: 3, 4: 2, 5: 1}
2.0
0.5
1
0.5

Julia

using Graphs

g = SimpleGraph(5)
for (i, j) in [(1,2),(1,3),(2,3),(3,4),(4,5)]
    add_edge!(g, i, j)
end
degree(g)                       # [2, 2, 3, 2, 1]
2 * ne(g) / nv(g)               # 2.0
2 * ne(g) / (nv(g) * (nv(g)-1)) # 0.5  (density)
length(connected_components(g)) # 1
global_clustering_coefficient(g)

Why it matters

Representing contact patterns, food webs, or gene-regulatory wiring as a graph turns qualitative “who interacts with whom” questions into quantitative ones. The degree distribution predicts how fast an epidemic spreads and who the likely super-connectors are; the giant component tells you whether an outbreak can become widespread; connectance and clustering summarise how tightly an ecological network is knit, which bears on its stability. Because a network is just its adjacency matrix, node-importance scores (see centrality) and spreading thresholds follow directly from linear algebra, and comparing a real network against a random-graph model reveals which features are surprising rather than expected by chance.