Prerequisites: Variance: Measuring Spread

Covariance: How Variables Move Together

This is an early draft. Content may change as it gets reviewed.

Variance tells you how much a single variable spreads out. But most interesting questions involve relationships between variables. Do tall people tend to weigh more? When one feature is high, is another feature high too?

Covariance measures this: do two variables tend to move in the same direction, opposite directions, or independently?

The idea

Plot $n$ data points on a scatter plot, with variable $X$ on one axis and variable $Y$ on the other. The cloud of points might tilt upward (positive relationship), downward (negative), or show no pattern at all.

Covariance captures the tilt:

$$\text{Cov}(X, Y) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})$$

For each data point, we multiply how far $x$ is from its mean by how far $y$ is from its mean. When both are above average (or both below), the product is positive. When one is above and the other below, the product is negative. Averaging these products gives the covariance.

Positive covariance: When $X$ is high, $Y$ tends to be high too.
Negative covariance: When $X$ is high, $Y$ tends to be low.
Near-zero covariance: $X$ and $Y$ move independently.

Correlation: standardised covariance

Covariance depends on the scales of $X$ and $Y$, which makes it hard to interpret (“the covariance of height and weight is 120” — is that a lot?). Dividing by both standard deviations gives the correlation coefficient:

$$r = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y}$$

Correlation is always between $-1$ and $+1$: - $r = +1$: Perfect positive linear relationship - $r = 0$: No linear relationship - $r = -1$: Perfect negative linear relationship

Try it yourself

Try It: Covariance

Drag points to rearrange the scatter plot. Click the chart to add new points. The dashed crosshair marks the mean. Shaded quadrants show which regions contribute positively or negatively to the covariance. Watch the correlation coefficient $r$ change as the cloud tilts.

Try arranging points in a tight upward line ($r \approx +1$), a flat horizontal spread ($r \approx 0$), or a downward slope ($r \approx -1$).

The covariance matrix

With two variables, covariance is a single number. With $p$ variables, you need covariance for every pair. This gives you a $p \times p$ covariance matrix $\Sigma$:

$$\Sigma = \begin{pmatrix} \text{Var}(X_1) & \text{Cov}(X_1, X_2) & \cdots & \text{Cov}(X_1, X_p) \ \text{Cov}(X_2, X_1) & \text{Var}(X_2) & \cdots & \text{Cov}(X_2, X_p) \ \vdots & \vdots & \ddots & \vdots \ \text{Cov}(X_p, X_1) & \text{Cov}(X_p, X_2) & \cdots & \text{Var}(X_p) \end{pmatrix}$$

The diagonal entries are variances (each variable with itself). The off-diagonal entries are covariances between pairs. The matrix is symmetric: $\text{Cov}(X_i, X_j) = \text{Cov}(X_j, X_i)$.

Why this matters

The covariance matrix is the central object in: - PCA: Finds the eigenvectors of the covariance matrix — the directions of maximum variance - Factor Analysis: Models the covariance matrix as shared factors plus noise - MDA (Multi-Dimensional Analysis): Biber’s method extracts dimensions from a covariance matrix of linguistic features

The covariance matrix captures all the pairwise relationships in your data. The question then becomes: what structure is hiding inside this matrix? That’s where eigenvalues come in.

With 67 linguistic features measured across thousands of texts (as in Biber’s register analysis), the covariance matrix is $67 \times 67$. It captures which features tend to co-occur: nominalisations and prepositions move together (positive covariance), while first-person pronouns and nominalisations move apart (negative covariance). The dimensions of register variation are hiding in this matrix.

With 20 blood markers measured across 500 patients, the covariance matrix reveals which markers travel together. High LDL and high triglycerides might covary positively (metabolic syndrome cluster), while HDL and triglycerides covary negatively. Clinicians already know these patterns intuitively; the matrix makes them precise.

With 30 environmental variables measured across 200 sites, the covariance matrix reveals ecological gradients. Soil moisture and organic matter might covary positively; elevation and temperature covary negatively. The latent structure (fertility, drainage, exposure) is encoded in these pairwise relationships.

With photometric measurements across multiple bands (ultraviolet, blue, visible, infrared) for thousands of stars, the covariance matrix reveals which bands move together. Hot blue stars have positively covarying UV and blue flux but negatively covarying infrared. Reddened stars (dust absorption) show a distinctive covariance signature. The Hertzsprung-Russell diagram is, at its heart, the dominant eigenvector of stellar colour covariance.