Covariance: How Variables Move Together
Variance tells you how much a single variable spreads out. But most interesting questions involve relationships between variables. Do tall people tend to weigh more? When one feature is high, is another feature high too?
Covariance measures this: do two variables tend to move in the same direction, opposite directions, or independently?
The idea
Plot $n$ data points on a scatter plot, with variable $X$ on one axis and variable $Y$ on the other. The cloud of points might tilt upward (positive relationship), downward (negative), or show no pattern at all.
Covariance captures the tilt:
$$\text{Cov}(X, Y) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})$$
For each data point, we multiply how far $x$ is from its mean by how far $y$ is from its mean. When both are above average (or both below), the product is positive. When one is above and the other below, the product is negative. Averaging these products gives the covariance.
- Positive covariance: When $X$ is high, $Y$ tends to be high too.
- Negative covariance: When $X$ is high, $Y$ tends to be low.
- Near-zero covariance: $X$ and $Y$ move independently.
Correlation: standardised covariance
Covariance depends on the scales of $X$ and $Y$, which makes it hard to interpret (“the covariance of height and weight is 120” — is that a lot?). Dividing by both standard deviations gives the correlation coefficient:
$$r = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y}$$
Correlation is always between $-1$ and $+1$: - $r = +1$: Perfect positive linear relationship - $r = 0$: No linear relationship - $r = -1$: Perfect negative linear relationship
Try it yourself
Drag points to rearrange the scatter plot. Click the chart to add new points. The dashed crosshair marks the mean. Shaded quadrants show which regions contribute positively or negatively to the covariance. Watch the correlation coefficient $r$ change as the cloud tilts.
Try arranging points in a tight upward line ($r \approx +1$), a flat horizontal spread ($r \approx 0$), or a downward slope ($r \approx -1$).
The covariance matrix
With two variables, covariance is a single number. With $p$ variables, you need covariance for every pair. This gives you a $p \times p$ covariance matrix $\Sigma$:
$$\Sigma = \begin{pmatrix} \text{Var}(X_1) & \text{Cov}(X_1, X_2) & \cdots & \text{Cov}(X_1, X_p) \ \text{Cov}(X_2, X_1) & \text{Var}(X_2) & \cdots & \text{Cov}(X_2, X_p) \ \vdots & \vdots & \ddots & \vdots \ \text{Cov}(X_p, X_1) & \text{Cov}(X_p, X_2) & \cdots & \text{Var}(X_p) \end{pmatrix}$$
The diagonal entries are variances (each variable with itself). The off-diagonal entries are covariances between pairs. The matrix is symmetric: $\text{Cov}(X_i, X_j) = \text{Cov}(X_j, X_i)$.
Why this matters
The covariance matrix is the central object in: - PCA: Finds the eigenvectors of the covariance matrix — the directions of maximum variance - Factor Analysis: Models the covariance matrix as shared factors plus noise - MDA (Multi-Dimensional Analysis): Biber’s method extracts dimensions from a covariance matrix of linguistic features
The covariance matrix captures all the pairwise relationships in your data. The question then becomes: what structure is hiding inside this matrix? That’s where eigenvalues come in.