Synapse

An interconnected graph of micro-tutorials

What is a Probability Distribution?

A probability distribution is a curve (or a formula, or a table) that tells you how likely each possible outcome is.

That’s the whole idea. Everything else — variance, covariance, PCA, factor analysis, Bayesian updating, topic models — builds on it.

A concrete example

Suppose you measure the heights of 100 people. You don’t get one number — you get a spread. Most people are near the average, a few are very tall or very short. The distribution describes the shape of that spread: where the bulk is, how wide it is, whether it’s symmetric.

The same idea applies everywhere:

In each case, the distribution replaces a single number with a shape that captures the full picture. And different data produces very different shapes:

Heights symmetric Income skewed right Word frequencies long-tailed

What a distribution tells you

The height of the curve at any point tells you how likely (or how plausible) that value is. The area under the curve between two points tells you the probability of falling in that range.

mean spread (±1 SD ≈ 68% of data) area = probability of falling here

Key properties you can read off a distribution:

These properties become the building blocks of everything that follows.

Try it yourself

Click on the number line to add data points. Watch the histogram and density curve build up. Try the presets to see how different data produces different shapes.

Try It: Build a Distribution

Things to try:

Two ways distributions appear

Distributions play two different roles, and the distinction matters:

  1. Describing data: The heights of 100 people, the frequencies of words in a corpus, the brightness of stars in a survey. The distribution summarises the variation you observe.

  2. Encoding uncertainty: You don’t know the true probability of heads for a coin. A distribution over the interval [0, 1] describes what you believe about that unknown quantity. This is the Bayesian reading — and it leads to powerful tools for learning from data.

Both uses are built on the same mathematics. The difference is what the curve represents: observed variation or epistemic uncertainty.

Where you’re heading

On this path, the “encoding uncertainty” reading is central. The Beta distribution shapes a curve over a single probability (like the coin bias). The Dirichlet distribution extends this to multiple outcomes at once. These become the priors in Latent Dirichlet Allocation — the standard model for discovering topics in text.