Theory¶

From intuition to implementation¶

You already solve bandit problems every day. You just don't call them that.

This series builds from everyday decision-making to the exact algorithm buildlog uses to select rules. No prerequisites beyond curiosity. By the end, the math won't feel like math — it'll feel like common sense you finally have notation for.

The arc¶

Page	What you'll learn	The intuition
The Restaurant Problem	Exploration vs. exploitation	Trying new restaurants vs. going to your favorite
The Price of Learning	Regret and why it matters	Every bad meal is a missed good one
Keeping Score	Beta distributions	How to represent "I think this is good but I'm not sure"
Making Decisions	Thompson Sampling	Let uncertainty guide exploration
Context Changes Everything	Contextual bandits	You wouldn't pick the same restaurant for a date and a quick lunch

Where this connects¶

buildlog uses a Thompson Sampling contextual bandit to decide which engineering rules to surface in your editor. The "restaurants" are rules. The "meals" are coding sessions. The "reviews" are whether you made the same mistake again.

Everything in this series maps directly to src/buildlog/core/bandit.py. The theory isn't academic — it's the code running in your terminal.

Who this is for¶

Engineers who want to understand why Thompson Sampling, not just that it works
The curious who want intuition before formalism
Skeptics who want to verify the math themselves

No probability background required. If you can follow a restaurant analogy, you can follow the math.