qortex-learning¶
Bandit-based adaptive learning for qortex: Thompson Sampling with Beta-Bernoulli posteriors, persistent state via SQLite, and pluggable reward models.
Install¶
As a standalone package:
pip install qortex-learning
Or as part of qortex:
pip install qortex # qortex-learning is a core dependency
Quick Start¶
from qortex.learning import Learner, LearnerConfig, Arm, ArmOutcome
# Create a learner with SQLite persistence (default)
learner = await Learner.create(LearnerConfig(name="prompts"))
# Define candidates
candidates = [
Arm(id="concise-v1", token_cost=10),
Arm(id="detailed-v2", token_cost=15),
Arm(id="structured-v3", token_cost=20),
]
# Select the best arm via Thompson Sampling
result = await learner.select(candidates, context={"task": "type-check"}, k=1)
print(f"Selected: {result.selected[0].id}")
# Observe the outcome
await learner.observe(ArmOutcome(
arm_id="detailed-v2",
outcome="accepted",
reward=1.0,
))
What It Does¶
qortex-learning provides a multi-armed bandit framework for adaptive selection. It powers the feedback loop in qortex: when users accept or reject retrieval results, the learning layer updates posterior distributions so future selections improve.
Learner¶
The main interface. Manages arm selection, observation, and posterior tracking.
| Method | Purpose |
|---|---|
select(arms, context, k) |
Choose k arms via Thompson Sampling |
observe(outcome) |
Record an accept/reject signal, update posteriors |
batch_observe(outcomes) |
Bulk observation for batch feedback |
metrics() |
Selection counts, reward rates, posterior summaries |
top_arms(k) |
Top-k arms ranked by posterior mean |
decay_arm(arm_id, factor) |
Shrink learned signal toward prior |
posteriors() |
Raw posterior parameters for all arms |
Strategies¶
Pluggable selection strategies via the LearningStrategy protocol:
| Strategy | Description |
|---|---|
ThompsonSampling |
Beta-Bernoulli Thompson Sampling (default) |
Reward Models¶
Convert raw outcomes to numeric rewards:
| Model | Mapping |
|---|---|
BinaryReward |
accepted=1, everything else=0 |
TernaryReward |
accepted=1, partial=0.5, rejected=0 |
Persistence¶
State survives restarts via pluggable stores:
| Store | Description |
|---|---|
SqliteLearningStore |
SQLite backend (default). Async via aiosqlite. |
PostgresLearningStore |
PostgreSQL backend. Async via asyncpg. Uses shared connection pool. |
JsonLearningStore |
JSON file backend. Good for debugging. |
All three implement the async LearningStore protocol, so custom backends are straightforward.
To use the PostgreSQL store, set QORTEX_STORE=postgres and DATABASE_URL:
from qortex.learning.postgres import PostgresLearningStore
store = PostgresLearningStore(pool=shared_pool)
await store.initialize() # creates tables if needed
learner = await Learner.create(LearnerConfig(name="prompts"), store=store)
How It Fits¶
qortex-learning is the adaptive layer beneath qortex_feedback. When a user accepts or rejects a retrieval result:
- The MCP server calls
learner.observe()with the outcome - The reward model converts the outcome to a numeric signal
- Thompson Sampling updates the Beta posterior for that arm
- Next
learner.select()samples from updated posteriors - The store persists state to SQLite so learning survives restarts
This creates the learning loop shown on the homepage: accepted results rise in rank on subsequent queries, and rejected results drop.
Async API¶
As of v0.8.0, all Learner methods are async. The constructor has been replaced with an async factory:
# Old (pre-0.8.0)
learner = Learner(LearnerConfig(name="prompts"))
# New (0.8.0+)
learner = await Learner.create(LearnerConfig(name="prompts"))
All methods (select, observe, batch_observe, top_arms, posteriors, metrics, decay_arm, reset) are now async def.
Requirements¶
- Python 3.11+
- aiosqlite (async SQLite, default store)
- asyncpg (optional, for PostgresLearningStore)
- qortex-observe (event emission for metrics/traces)
License¶
MIT