Prerequisites: What is a Probability Distribution?

Variance: Measuring Spread

This is an early draft. Content may change as it gets reviewed.

You know that a probability distribution is a curve describing how plausible different values are. But two distributions can have the same centre (mean) and look completely different — one tight and narrow, the other wide and flat. Variance measures this difference: how spread out the values are.

The idea

Imagine measuring the heights of 100 people. The average might be 170 cm. But are most people between 168 and 172 (low variance), or are they scattered between 150 and 190 (high variance)?

Variance answers: on average, how far is each value from the mean?

The formula

For $n$ values $x_1, x_2, \ldots, x_n$ with mean $\bar{x}$:

$$\text{Var}(X) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2$$

Each term $(x_i - \bar{x})^2$ is the squared distance from the mean. We square it so that values above and below the mean don’t cancel each other out. Then we average.

Standard deviation is just the square root of variance: $\sigma = \sqrt{\text{Var}(X)}$. It has the same units as the original data, which makes it more interpretable (“the standard deviation of heights is 8 cm” is easier to grasp than “the variance is 64 cm²”).

Building intuition

Variance = 0: Every value is identical. No spread at all.
Small variance: Values cluster tightly around the mean.
Large variance: Values are scattered widely.

For a normal distribution (the bell curve): - About 68% of values fall within 1 standard deviation of the mean - About 95% fall within 2 standard deviations - About 99.7% fall within 3

Try it yourself

Try It: Variance

Each circle is a data point. The dashed line is the mean. The shaded bars show each point’s deviation from the mean. Drag the points left and right to see how the variance and standard deviation change.

Try clustering all points near the mean — variance drops toward zero. Spread them to the edges — variance climbs.

Why this matters

Variance is the building block for almost all multivariate statistics. Covariance measures how two variables vary together. PCA finds the directions of maximum variance in high-dimensional data. Factor analysis models shared variance across many variables. Understanding variance for one variable is the foundation for all of these.

**In corpus linguistics**, you might measure how often the word *the* appears per 1,000 words across 500 texts. Academic papers might cluster around 65 per thousand (low variance within that register), while the full corpus — mixing conversation, fiction, news, and academic prose — might range from 20 to 90 (high variance). That high variance is exactly what register analysis tries to explain.

**In clinical research**, you might measure blood pressure across 200 patients. A healthy group might cluster tightly around 120/80 (low variance), while a mixed population including hypertensive patients would spread from 90/60 to 180/110 (high variance). Treatment studies ask: did our intervention *reduce* the variance?

**In ecology**, you might measure tree heights across a forest. A single-species plantation might have low variance (all trees planted the same year, growing at similar rates). A natural old-growth forest would have high variance — saplings, mature trees, and ancient giants all coexisting.

**In music theory**, you might measure the pitch intervals between successive notes in a melody. A chant-like melody moving in steps has low variance. A virtuosic passage with large leaps has high variance. The “shape” of a melody is partly captured by the variance of its intervals.

**In astronomy**, you might measure the brightness of a star over time. A stable main-sequence star has low variance in its light curve. A Cepheid variable pulsates rhythmically — moderate variance with periodic structure. An eclipsing binary shows sudden dips — high variance concentrated in sharp events. The variance of a light curve is one of the first features used to classify variable stars.