qortex-learning

Bandit-based adaptive learning for qortex: Thompson Sampling with Beta-Bernoulli posteriors, persistent state via SQLite, and pluggable reward models.

input Arms (prompt variants, configs) select Thompson Sampling Beta-Bernoulli posteriors context-aware selection token cost weighting observe Reward Model (Binary / Ternary) persist SQLite / JSON store posteriors

Install

As a standalone package:

pip install qortex-learning

Or as part of qortex:

pip install qortex  # qortex-learning is a core dependency

Quick Start

from qortex.learning import Learner, LearnerConfig, Arm, ArmOutcome

# Create a learner with SQLite persistence (default)
learner = await Learner.create(LearnerConfig(name="prompts"))

# Define candidates
candidates = [
    Arm(id="concise-v1", token_cost=10),
    Arm(id="detailed-v2", token_cost=15),
    Arm(id="structured-v3", token_cost=20),
]

# Select the best arm via Thompson Sampling
result = await learner.select(candidates, context={"task": "type-check"}, k=1)
print(f"Selected: {result.selected[0].id}")

# Observe the outcome
await learner.observe(ArmOutcome(
    arm_id="detailed-v2",
    outcome="accepted",
    reward=1.0,
))

What It Does

qortex-learning provides a multi-armed bandit framework for adaptive selection. It powers the feedback loop in qortex: when users accept or reject retrieval results, the learning layer updates posterior distributions so future selections improve.

Learner

The main interface. Manages arm selection, observation, and posterior tracking.

Method Purpose
select(arms, context, k) Choose k arms via Thompson Sampling
observe(outcome) Record an accept/reject signal, update posteriors
batch_observe(outcomes) Bulk observation for batch feedback
metrics() Selection counts, reward rates, posterior summaries
top_arms(k) Top-k arms ranked by posterior mean
decay_arm(arm_id, factor) Shrink learned signal toward prior
posteriors() Raw posterior parameters for all arms

Strategies

Pluggable selection strategies via the LearningStrategy protocol:

Strategy Description
ThompsonSampling Beta-Bernoulli Thompson Sampling (default)

Reward Models

Convert raw outcomes to numeric rewards:

Model Mapping
BinaryReward accepted=1, everything else=0
TernaryReward accepted=1, partial=0.5, rejected=0

Persistence

State survives restarts via pluggable stores:

Store Description
SqliteLearningStore SQLite backend (default). Async via aiosqlite.
PostgresLearningStore PostgreSQL backend. Async via asyncpg. Uses shared connection pool.
JsonLearningStore JSON file backend. Good for debugging.

All three implement the async LearningStore protocol, so custom backends are straightforward.

To use the PostgreSQL store, set QORTEX_STORE=postgres and DATABASE_URL:

from qortex.learning.postgres import PostgresLearningStore

store = PostgresLearningStore(pool=shared_pool)
await store.initialize()  # creates tables if needed
learner = await Learner.create(LearnerConfig(name="prompts"), store=store)

How It Fits

qortex-learning is the adaptive layer beneath qortex_feedback. When a user accepts or rejects a retrieval result:

  1. The MCP server calls learner.observe() with the outcome
  2. The reward model converts the outcome to a numeric signal
  3. Thompson Sampling updates the Beta posterior for that arm
  4. Next learner.select() samples from updated posteriors
  5. The store persists state to SQLite so learning survives restarts

This creates the learning loop shown on the homepage: accepted results rise in rank on subsequent queries, and rejected results drop.

Async API

As of v0.8.0, all Learner methods are async. The constructor has been replaced with an async factory:

# Old (pre-0.8.0)
learner = Learner(LearnerConfig(name="prompts"))

# New (0.8.0+)
learner = await Learner.create(LearnerConfig(name="prompts"))

All methods (select, observe, batch_observe, top_arms, posteriors, metrics, decay_arm, reset) are now async def.

Requirements

  • Python 3.11+
  • aiosqlite (async SQLite, default store)
  • asyncpg (optional, for PostgresLearningStore)
  • qortex-observe (event emission for metrics/traces)

License

MIT