qortex-learning¶

Bandit-based adaptive learning for qortex: Thompson Sampling with Beta-Bernoulli posteriors, persistent state via SQLite, and pluggable reward models.

Install¶

As a standalone package:

pip install qortex-learning

Or as part of qortex:

pip install qortex  # qortex-learning is a core dependency

Quick Start¶

from qortex.learning import Learner, LearnerConfig, Arm, ArmOutcome

# Create a learner with SQLite persistence (default)
learner = await Learner.create(LearnerConfig(name="prompts"))

# Define candidates
candidates = [
    Arm(id="concise-v1", token_cost=10),
    Arm(id="detailed-v2", token_cost=15),
    Arm(id="structured-v3", token_cost=20),
]

# Select the best arm via Thompson Sampling
result = await learner.select(candidates, context={"task": "type-check"}, k=1)
print(f"Selected: {result.selected[0].id}")

# Observe the outcome
await learner.observe(ArmOutcome(
    arm_id="detailed-v2",
    outcome="accepted",
    reward=1.0,
))

What It Does¶

qortex-learning provides a multi-armed bandit framework for adaptive selection. It powers the feedback loop in qortex: when users accept or reject retrieval results, the learning layer updates posterior distributions so future selections improve.

Learner¶

The main interface. Manages arm selection, observation, and posterior tracking.

Method	Purpose
`select(arms, context, k)`	Choose k arms via Thompson Sampling
`observe(outcome)`	Record an accept/reject signal, update posteriors
`batch_observe(outcomes)`	Bulk observation for batch feedback
`metrics()`	Selection counts, reward rates, posterior summaries
`top_arms(k)`	Top-k arms ranked by posterior mean
`decay_arm(arm_id, factor)`	Shrink learned signal toward prior
`posteriors()`	Raw posterior parameters for all arms

Strategies¶

Pluggable selection strategies via the LearningStrategy protocol:

Strategy	Description
`ThompsonSampling`	Beta-Bernoulli Thompson Sampling (default)

Reward Models¶

Convert raw outcomes to numeric rewards:

Model	Mapping
`BinaryReward`	accepted=1, everything else=0
`TernaryReward`	accepted=1, partial=0.5, rejected=0

Persistence¶

State survives restarts via pluggable stores:

Store	Description
`SqliteLearningStore`	SQLite backend (default). Async via aiosqlite.
`PostgresLearningStore`	PostgreSQL backend. Async via asyncpg. Uses shared connection pool.
`JsonLearningStore`	JSON file backend. Good for debugging.

All three implement the async LearningStore protocol, so custom backends are straightforward.

To use the PostgreSQL store, set QORTEX_STORE=postgres and DATABASE_URL:

from qortex.learning.postgres import PostgresLearningStore

store = PostgresLearningStore(pool=shared_pool)
await store.initialize()  # creates tables if needed
learner = await Learner.create(LearnerConfig(name="prompts"), store=store)

How It Fits¶

qortex-learning is the adaptive layer beneath qortex_feedback. When a user accepts or rejects a retrieval result:

The MCP server calls learner.observe() with the outcome
The reward model converts the outcome to a numeric signal
Thompson Sampling updates the Beta posterior for that arm
Next learner.select() samples from updated posteriors
The store persists state to SQLite so learning survives restarts

This creates the learning loop shown on the homepage: accepted results rise in rank on subsequent queries, and rejected results drop.

Async API¶

As of v0.8.0, all Learner methods are async. The constructor has been replaced with an async factory:

# Old (pre-0.8.0)
learner = Learner(LearnerConfig(name="prompts"))

# New (0.8.0+)
learner = await Learner.create(LearnerConfig(name="prompts"))

All methods (select, observe, batch_observe, top_arms, posteriors, metrics, decay_arm, reset) are now async def.

Requirements¶

Python 3.11+
aiosqlite (async SQLite, default store)
asyncpg (optional, for PostgresLearningStore)
qortex-observe (event emission for metrics/traces)

License¶

MIT