What is a Probability Distribution?
A probability distribution is a curve (or a formula, or a table) that tells you how likely each possible outcome is.
That’s the whole idea. Everything else — variance, covariance, PCA, factor analysis, Bayesian updating, topic models — builds on it.
A concrete example
Suppose you measure the heights of 100 people. You don’t get one number — you get a spread. Most people are near the average, a few are very tall or very short. The distribution describes the shape of that spread: where the bulk is, how wide it is, whether it’s symmetric.
The same idea applies everywhere:
- Word frequencies: Some words (the, of) appear constantly; most words appear rarely. The distribution over word frequencies has a very long tail.
- Star brightness: Most stars in a survey are faint; a few are very bright. The distribution tells you the relative likelihood.
- Test scores: Cluster near the mean, thin out in the tails. The familiar bell curve.
- Coin bias: If you don’t know whether a coin is fair, the distribution describes your uncertainty — how plausible each possible bias is, from 0 to 1.
In each case, the distribution replaces a single number with a shape that captures the full picture. And different data produces very different shapes:
What a distribution tells you
The height of the curve at any point tells you how likely (or how plausible) that value is. The area under the curve between two points tells you the probability of falling in that range.
Key properties you can read off a distribution:
- Centre (mean, median): Where is the bulk?
- Spread (variance, standard deviation): How wide is the distribution?
- Shape: Symmetric? Skewed? Heavy-tailed? Bimodal?
These properties become the building blocks of everything that follows.
Try it yourself
Click on the number line to add data points. Watch the histogram and density curve build up. Try the presets to see how different data produces different shapes.
Things to try:
- Click to add ~20 points near the centre, then a few at the edges. Watch the bell shape emerge.
- Hit Skewed — notice the long right tail and how the mean is pulled rightward.
- Hit Bimodal — two peaks. The mean sits between them, where almost no data actually is.
- Hit Uniform — flat. Every value equally likely. The distribution says “I don’t know.”
Two ways distributions appear
Distributions play two different roles, and the distinction matters:
-
Describing data: The heights of 100 people, the frequencies of words in a corpus, the brightness of stars in a survey. The distribution summarises the variation you observe.
-
Encoding uncertainty: You don’t know the true probability of heads for a coin. A distribution over the interval [0, 1] describes what you believe about that unknown quantity. This is the Bayesian reading — and it leads to powerful tools for learning from data.
Both uses are built on the same mathematics. The difference is what the curve represents: observed variation or epistemic uncertainty.
Where you’re heading
On this path, the “encoding uncertainty” reading is central. The Beta distribution shapes a curve over a single probability (like the coin bias). The Dirichlet distribution extends this to multiple outcomes at once. These become the priors in Latent Dirichlet Allocation — the standard model for discovering topics in text.