Prerequisites: What is a Probability Distribution?

The Beta Distribution

This is an early draft. Content may change as it gets reviewed.

The Beta distribution is a probability distribution over a single probability — a number between 0 and 1. It’s the natural tool for expressing uncertainty about things like coin biases, success rates, or proportions.

What it looks like

It has two parameters, traditionally called $a$ and $b$, that control its shape:

Both small (like 0.5, 0.5): The curve piles up near 0 and 1 — the coin is probably very biased one way or the other, but you don’t know which.
Both equal to 1 (a = 1, b = 1): A flat line. Every probability from 0 to 1 is equally plausible. You know nothing about the coin.
Both moderate and equal (like 5, 5): A smooth bump centred at 0.5. You think the coin is roughly fair.
Unequal (like 2, 8): The bump shifts. Here it’s shifted toward small values — you think the coin tends toward tails.
Both large and equal (like 50, 50): A very narrow spike at 0.5. You’re quite confident the coin is fair.

The centre of the bump (the mean) is always at $a / (a + b)$. The width of the bump depends on how large $a + b$ is: bigger values = narrower curve = more confidence.

Explore it

Try It: Beta Distribution

a: 2.0 b: 5.0

Play with the sliders. Try setting both to 1 (flat). Then both to 10 (narrow bump at 0.5). Then $a = 1$ and $b = 10$ (piled up near zero). The dashed line shows the mean.

A useful trick: pseudo-counts

There’s a nice way to think about $a$ and $b$: pretend you’ve already run an experiment before collecting any data. The parameter $a$ is like “imaginary heads you’ve already seen” and $b$ is “imaginary tails.” So Beta(3, 7) means: “Before I even start flipping, I’m going to act as though I’ve already seen 3 heads and 7 tails.”

This “pseudo-count” interpretation will keep coming back. It’s the key intuition for everything that follows.

Why this matters

The Beta distribution is the building block for what comes next: - The Dirichlet distribution is what you get when you move from a coin (2 outcomes) to a die (many outcomes) - The stick-breaking construction generates the Dirichlet process by making a sequence of Beta-distributed random choices - The concentration parameter $\theta$ that controls how spread out or concentrated a distribution is shows up directly as a Beta parameter

Struggling with something?

These optional nodes cover specific concepts in more detail:

🔍 Pseudo-Counts: Why Imaginary Data Helps