Theory

From intuition to implementation

You already solve bandit problems every day. You just don't call them that.

This series builds from everyday decision-making to the exact algorithm buildlog uses to select rules. No prerequisites beyond curiosity. By the end, the math won't feel like math — it'll feel like common sense you finally have notation for.

The arc

Page What you'll learn The intuition
The Restaurant Problem Exploration vs. exploitation Trying new restaurants vs. going to your favorite
The Price of Learning Regret and why it matters Every bad meal is a missed good one
Keeping Score Beta distributions How to represent "I think this is good but I'm not sure"
Making Decisions Thompson Sampling Let uncertainty guide exploration
Context Changes Everything Contextual bandits You wouldn't pick the same restaurant for a date and a quick lunch

Where this connects

buildlog uses a Thompson Sampling contextual bandit to decide which engineering rules to surface in your editor. The "restaurants" are rules. The "meals" are coding sessions. The "reviews" are whether you made the same mistake again.

Everything in this series maps directly to src/buildlog/core/bandit.py. The theory isn't academic — it's the code running in your terminal.

Who this is for

  • Engineers who want to understand why Thompson Sampling, not just that it works
  • The curious who want intuition before formalism
  • Skeptics who want to verify the math themselves

No probability background required. If you can follow a restaurant analogy, you can follow the math.