Matrix and Vector Notation
Matrices and vectors are the language for organizing data and the parameters of statistical models. A regression dataset, a covariance structure, or a disease-model state are all naturally written as arrays of numbers, and a shared notation keeps the bookkeeping honest.
Scalars, vectors, and matrices
A scalar is a single number, written in lowercase italics: .
A vector is an ordered list of numbers. By convention a vector is a column vector (a single column):
Its transpose is a row vector .
A matrix is a rectangular array of numbers with rows and columns; we say it has dimension :
Indexing
The entry in row and column is (row first, then column). For the matrix above, is the number in the second row, third column. A vector entry uses a single index.
Special matrices
- Identity : square, s on the diagonal and s elsewhere. It is the multiplicative identity: .
- Zero matrix : every entry is .
- Diagonal matrix: nonzero entries only on the diagonal, for .
- Symmetric matrix: square with , i.e. . Covariance matrices are symmetric.
Conformability
Operations only make sense when dimensions match. Addition requires two matrices of identical dimension. Matrix multiplication requires the number of columns of to equal the number of rows of : an times an gives an result. Checking conformability first is the quickest way to catch mistakes.
Statistical motivation: the data matrix
The canonical object in statistics is the data matrix with observations (rows) and variables (columns):
Row is one subject; column is one measured variable. Nearly every model — linear regression, PCA, generalized linear models — begins by writing the data this way.
Computing it
R
# Column vector and matrix (R fills column-by-column by default)
x <- c(1, 2, 3)
A <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, byrow = TRUE)
A # 2 x 3 matrix, rows: (1 2 3) and (4 5 6)
dim(A) # 2 3
A[2, 3] # 6 (row 2, col 3)
diag(3) # 3x3 identity matrix
Python
import numpy as np
x = np.array([1, 2, 3]) # 1-D array
A = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3
A.shape # (2, 3)
A[1, 2] # 6 (0-based: row index 1, col index 2)
np.eye(3) # 3x3 identity
Julia
using LinearAlgebra
x = [1, 2, 3] # column vector
A = [1 2 3; 4 5 6] # 2 x 3
size(A) # (2, 3)
A[2, 3] # 6 (1-based indexing)
I(3) # 3x3 identity (UniformScaling as a matrix)
Why it matters for statistics
Clear notation is the foundation for everything downstream: the design matrix in regression, the covariance matrix, and the Jacobian of a disease model are all matrices. Knowing dimensions and conformability lets you predict whether an expression like is even defined (it is: ) before you compute anything.