What is a Probability Distribution?

This is an early draft. Content may change as it gets reviewed.

Before diving into specific distributions, let’s make sure the core idea is clear.

A single number isn’t enough

Suppose you flip a coin 10 times and get 7 heads. What’s the probability of heads?

One answer: 70%. But you know that’s a rough estimate. If you flipped it 10 more times, you might get 5 heads or 9 heads. The true probability could be 0.6, or 0.8, or anywhere in between. You’re uncertain.

The trouble with a single number like “70%” is that it doesn’t tell you how uncertain you are. Did you flip 10 times (pretty uncertain) or 10,000 times (very confident)?

A curve instead of a number

A probability distribution replaces a single number with a curve — a shape that says how plausible each possible value is.

For the coin, the curve lives on the interval from 0 to 1 (because the probability of heads must be between 0 and 1). The curve might be:

Wide and flat — “I have no idea. Any probability is equally plausible.”
A bump near 0.7 — “I think heads is about 70%, but I’m not sure.”
A narrow spike at 0.7 — “I’m very confident it’s close to 70%.”

The height of the curve at any point tells you how plausible that value is. The area under the curve between any two points tells you the probability that the true value lies in that range.

Why this matters

This idea — encoding uncertainty as a curve rather than a point — is the foundation of Bayesian statistics. Instead of asking “what is the probability?”, we ask “what do I believe about the probability, and how confident am I?”

Every distribution in the nodes that follow is a specific way of shaping this curve:

The Beta distribution shapes a curve over a single probability (like the coin)
The Dirichlet distribution extends this to multiple probabilities at once
The Dirichlet process extends it to infinitely many