Centrality and Node Importance

Centrality measures rank the nodes of a network by their structural importance, turning a vague notion like “keystone species” or “superspreader” into a number you can compute. Different measures capture different kinds of importance — raw connectivity, reach, brokerage, or influence — and each suggests a different node to protect, vaccinate, or remove.

A network with nodes sized and colored by eigenvector centrality, highlighting the most influential nodes.

Setting

A network on nn nodes is described by its adjacency matrix AA, where Aij=1A_{ij} = 1 if there is an edge from ii to jj and 00 otherwise (see networks). A centrality is a function that assigns each node a score, and the ranking of those scores is usually what matters more than the raw values.

Degree centrality

The simplest measure is the degree kik_i: the number of edges incident to node ii.

ki=j=1nAij.k_i = \sum_{j=1}^{n} A_{ij}.

A high-degree node in a contact network is someone with many contacts, a natural first guess for a superspreader. Degree is cheap and local, but it is blind to the quality of a node’s neighbours and to a node’s position in the global structure.

Closeness centrality

Closeness rewards nodes that can reach everyone else quickly. Let d(i,j)d(i,j) be the shortest-path distance from ii to jj. Closeness is the inverse of the mean distance to all other nodes,

Ci=n1jid(i,j).C_i = \frac{n-1}{\sum_{j \ne i} d(i,j)}.

A node with high closeness sits near the “centre” of the network, so a rumour or pathogen starting there tends to saturate the population fastest.

Betweenness centrality

Betweenness identifies bridges and brokers — nodes that lie on many shortest paths and therefore control flow between otherwise separated regions. Let σst\sigma_{st} be the number of shortest paths from ss to tt, and σst(i)\sigma_{st}(i) the number of those that pass through ii.

Bi=sitσst(i)σst.B_i = \sum_{s \ne i \ne t} \frac{\sigma_{st}(i)}{\sigma_{st}}.

Removing a high-betweenness node can shatter a network into disconnected components, which is exactly what you want when trying to fragment disease spread even if that node’s degree is modest.

Eigenvector centrality

Degree counts neighbours equally, but influence is recursive: a node is important if it is connected to other important nodes. Writing each node’s score as proportional to the sum of its neighbours’ scores gives

xi=1λ1j=1nAijxj,i.e.Ax=λ1x.x_i = \frac{1}{\lambda_1} \sum_{j=1}^{n} A_{ij}\, x_j, \qquad\text{i.e.}\qquad A\mathbf{x} = \lambda_1 \mathbf{x}.

The centrality vector x\mathbf{x} is the leading eigenvector of AA — the one paired with the largest eigenvalue λ1\lambda_1. Because AA is non-negative, the Perron–Frobenius theorem guarantees this leading eigenvector has all-positive entries, so every node gets a sensible positive score. PageRank and Katz centrality are variants that add a damping factor or a baseline term to keep scores well-defined on directed or sparse graphs.

Worked example

Consider an undirected “paw” graph on nodes {1,2,3,4}\{1,2,3,4\}: a triangle on 1,2,31,2,3 plus a pendant node 44 attached only to node 11. Its adjacency matrix is

A=[0111101011001000].A = \begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{bmatrix}.

Degree centrality

Summing rows gives degrees k1=3, k2=2, k3=2, k4=1k_1 = 3,\ k_2 = 2,\ k_3 = 2,\ k_4 = 1. By degree, node 11 is the clear hub.

Eigenvector centrality

We seek x\mathbf{x} with Ax=λ1xA\mathbf{x} = \lambda_1 \mathbf{x}. The characteristic polynomial factors so that the dominant root satisfies λ12.170\lambda_1 \approx 2.170. Solving (Aλ1I)x=0(A - \lambda_1 I)\mathbf{x} = 0 and normalizing so the largest entry is 11 gives approximately

x(1.000, 0.740, 0.740, 0.461).\mathbf{x} \approx (1.000,\ 0.740,\ 0.740,\ 0.461).

Node 11 leads on both measures, but notice the pendant node 44 — degree 11 — still scores 0.4610.461 because its only neighbour is the most important node. Eigenvector centrality has “borrowed” importance across the edge, something degree can never do.

In code

Each library reports the same four centralities; results match the worked example up to normalization.

R

library(igraph)

A <- rbind(c(0,1,1,1),
           c(1,0,1,0),
           c(1,1,0,0),
           c(1,0,0,0))
g <- graph_from_adjacency_matrix(A, mode = "undirected")

degree(g)                          # 3 2 2 1
round(eigen_centrality(g)$vector, 3)  # 1.000 0.740 0.740 0.461
$round(closeness(g), 3)
round(betweenness(g), 3)           # node 1 is the only broker

Python

import numpy as np
import networkx as nx

A = np.array([[0,1,1,1],
              [1,0,1,0],
              [1,1,0,0],
              [1,0,0,0]])
G = nx.from_numpy_array(A)

print(dict(G.degree()))                        # {0:3, 1:2, 2:2, 3:1}
ev = nx.eigenvector_centrality_numpy(G)
print({k: round(v/max(ev.values()), 3) for k, v in ev.items()})
# {0: 1.0, 1: 0.74, 2: 0.74, 3: 0.461}
print({k: round(v, 3) for k, v in nx.betweenness_centrality(G).items()})
{0: 3, 1: 2, 2: 2, 3: 1}
{0: 1.0, 1: 0.855, 2: 0.855, 3: 0.461}
{0: 0.667, 1: 0.0, 2: 0.0, 3: 0.0}

Julia

using Graphs, LinearAlgebra

A = [0 1 1 1;
     1 0 1 0;
     1 1 0 0;
     1 0 0 0]
g = SimpleGraph(A)

degree(g)                                       # [3, 2, 2, 1]
ev = eigenvector_centrality(g)
round.(ev ./ maximum(ev), digits = 3)           # [1.0, 0.74, 0.74, 0.461]
round.(betweenness_centrality(g), digits = 3)

Why it matters

Centrality translates “which node matters?” into arithmetic you can act on. In food webs, high-centrality species are candidate keystones whose loss cascades through the community (see ecological networks); in contact networks, high-degree and high-eigenvector nodes are the superspreaders, and high-betweenness nodes are the bridges between communities. Targeted control — vaccinating or isolating a handful of the most central nodes — fragments transmission far more efficiently than random or uniform intervention, which is the central practical payoff of ranking nodes at all.