Configuration¶

Basic Configuration¶

Add a learning section to openclaw.json:

{
  "learning": {
    "enabled": true,
    "phase": "passive"
  }
}

Full Configuration¶

{
  "learning": {
    "enabled": true,
    "phase": "passive",
    "strategy": "thompson",
    "tokenBudget": 8000,
    "baselineRate": 0.10,
    "minPulls": 5
  }
}

Option Reference¶

`enabled`¶


Type	`boolean`
Default	`true`

Enable or disable the learning layer entirely. When disabled, no traces are recorded and no posteriors are updated.

`phase`¶


Type	`"passive" \\| "active"`
Default	`"passive"`

Controls whether the learning layer actively optimizes prompts.

passive — Traces are recorded and posteriors are maintained, but all arms are included in every run. Safe to leave on indefinitely.
active — Thompson Sampling selects which arms to include based on learned posteriors. Low-value arms may be excluded to save tokens.

`strategy`¶


Type	`"thompson"`
Default	`"thompson"`

The selection strategy. Currently only Thompson Sampling is supported. Reserved for future strategies (e.g., LinUCB with contextual features).

`tokenBudget`¶


Type	`number`
Default	`8000`

Maximum tokens allocated for prompt components (tools, skills, files). Arms are selected greedily within this budget, prioritizing seeds and underexplored arms.

Reference values:

Budget	Use Case
`4000`	Minimal prompt, aggressive optimization
`8000`	Default balance
`16000`	Large tool inventories
`32000`	Very permissive, minimal exclusion

`baselineRate`¶


Type	`number` (0.0 - 1.0)
Default	`0.10`

Fraction of runs that use the full prompt (all arms included) for counterfactual evaluation. Higher rates provide better comparison data but reduce optimization gains.

Recommended rates by inventory size:

Arm Count	Recommended Rate
1-10	`0.20` (20%)
11-50	`0.10` (10%)
50+	`0.05` (5%)

`minPulls`¶


Type	`number`
Default	`5`

Arms with fewer than this many observations are always included (never excluded by Thompson Sampling). This ensures every arm gets enough data before the bandit can decide to drop it.

This value is forwarded to the qortex backend as an exploration floor. Even when the token budget is tight, arms below the minPulls threshold are guaranteed inclusion.

`decayHalfLifeDays` (future)¶


Type	`number`
Default	—

Reserved for v0.0.2 temporal decay. Will allow posteriors to gradually forget old observations, adapting to changing tool relevance over time.

Config Precedence¶

Configuration is resolved in this order (highest priority first):

CLI flags (e.g., --host, --port)
Environment variables (OPENCLAW_GATEWAY_HOST)
openclaw.json in project root
Built-in defaults

Scenario Examples¶

Minimal (Observe Only)¶

{
  "learning": {
    "enabled": true,
    "phase": "passive"
  }
}

Leave all other options at defaults. Data accumulates silently.

Passive with Conservative Budget¶

{
  "learning": {
    "enabled": true,
    "phase": "passive",
    "tokenBudget": 4000,
    "minPulls": 10
  }
}

Useful for analysis: see what would be excluded with a tight budget, without actually excluding anything yet.

Active with High Exploration¶

{
  "learning": {
    "enabled": true,
    "phase": "active",
    "tokenBudget": 8000,
    "baselineRate": 0.20,
    "minPulls": 10
  }
}

More baseline runs and higher minPulls means the bandit explores longer before committing to exclusions.

Active with Aggressive Optimization¶

{
  "learning": {
    "enabled": true,
    "phase": "active",
    "tokenBudget": 4000,
    "baselineRate": 0.05,
    "minPulls": 3
  }
}

Tight budget, few baselines, quick convergence. Use when you have a large tool inventory and want fast token savings.

Next Steps¶

CLI Reference — All commands documented
Thompson Sampling — How the selection strategy works