The Beta Distribution
The Beta distribution is a probability distribution over a single probability — a number between 0 and 1. It’s the natural tool for expressing uncertainty about things like coin biases, success rates, or proportions.
What it looks like
It has two parameters, traditionally called $a$ and $b$, that control its shape:
- Both small (like 0.5, 0.5): The curve piles up near 0 and 1 — the coin is probably very biased one way or the other, but you don’t know which.
- Both equal to 1 (a = 1, b = 1): A flat line. Every probability from 0 to 1 is equally plausible. You know nothing about the coin.
- Both moderate and equal (like 5, 5): A smooth bump centred at 0.5. You think the coin is roughly fair.
- Unequal (like 2, 8): The bump shifts. Here it’s shifted toward small values — you think the coin tends toward tails.
- Both large and equal (like 50, 50): A very narrow spike at 0.5. You’re quite confident the coin is fair.
The centre of the bump (the mean) is always at $a / (a + b)$. The width of the bump depends on how large $a + b$ is: bigger values = narrower curve = more confidence.
Explore it
Play with the sliders. Try setting both to 1 (flat). Then both to 10 (narrow bump at 0.5). Then $a = 1$ and $b = 10$ (piled up near zero). The dashed line shows the mean.
A useful trick: pseudo-counts
There’s a nice way to think about $a$ and $b$: pretend you’ve already run an experiment before collecting any data. The parameter $a$ is like “imaginary heads you’ve already seen” and $b$ is “imaginary tails.” So Beta(3, 7) means: “Before I even start flipping, I’m going to act as though I’ve already seen 3 heads and 7 tails.”
This “pseudo-count” interpretation will keep coming back. It’s the key intuition for everything that follows.
Why this matters
The Beta distribution is the building block for what comes next: - The Dirichlet distribution is what you get when you move from a coin (2 outcomes) to a die (many outcomes) - The stick-breaking construction generates the Dirichlet process by making a sequence of Beta-distributed random choices - The concentration parameter $\theta$ that controls how spread out or concentrated a distribution is shows up directly as a Beta parameter
These optional nodes cover specific concepts in more detail: