Confidence Scoring Methodology

How data quality is assessed and communicated.

Overview

Every emission estimate includes a confidence score (0.0–1.0) indicating data quality. This enables:

  • Transparent uncertainty communication
  • Prioritization of data improvement efforts
  • Compliance with reporting standards

Confidence Scale

Score Label Meaning
0.8–1.0 Very High Primary data from supplier
0.6–0.8 High Published secondary data
0.4–0.6 Medium Research-based estimates
0.2–0.4 Low Model extrapolation
0.0–0.2 Very Low Fallback estimates

Factor Sources

Confidence depends on how factors were derived:

Source Typical Confidence Description
measured 0.8+ Direct measurement by provider
research 0.5–0.7 Academic research on similar models
estimated 0.3–0.5 Extrapolation from model characteristics
fallback 0.1–0.2 Generic estimate for unknown models

Calculation

Per-Trace Confidence

Each trace inherits the factor's confidence:

trace.confidence = factor.confidence

Aggregated Confidence

Summary confidence is token-weighted average:

avg_confidence = Σ(trace.confidence × trace.totalTokens) / Σ(trace.totalTokens)

Weighting by tokens ensures high-volume models dominate the average.

GHG Protocol Mapping

Confidence maps to GHG Protocol Data Quality Score (DQS):

Confidence DQS GHG Protocol Description
≥0.8 1 Primary data from suppliers
≥0.6 2 Published secondary data
≥0.4 3 Average secondary data
≥0.2 4 Estimated data
<0.2 5 Highly uncertain
function confidenceToDataQuality(confidence: number): 1 | 2 | 3 | 4 | 5 {
  if (confidence >= 0.8) return 1;
  if (confidence >= 0.6) return 2;
  if (confidence >= 0.4) return 3;
  if (confidence >= 0.2) return 4;
  return 5;
}

Uncertainty Conversion

For ISO 14064 reporting, confidence converts to uncertainty bounds:

Confidence Uncertainty Range
≥0.7 ±15% 85%–115%
≥0.5 ±30% 70%–130%
≥0.3 ±50% 50%–150%
<0.3 ±100% 0%–200%
function confidenceToUncertainty(confidence: number): { lower: number; upper: number } {
  if (confidence >= 0.7) return { lower: 0.85, upper: 1.15 };
  if (confidence >= 0.5) return { lower: 0.70, upper: 1.30 };
  if (confidence >= 0.3) return { lower: 0.50, upper: 1.50 };
  return { lower: 0.00, upper: 2.00 };
}

Display

CLI Output

Environmental Impact [PASSIVE]
  Grid carbon: 400 gCO₂/kWh (default)  |  Confidence: low (32%)

Dashboard

Confidence shown as colored badge: - 🟢 High (≥60%) - 🟡 Medium (≥40%) - 🔴 Low (<40%)

Export

All exports include confidence/DQS fields:

{
  "confidence": 0.32,
  "dataQualityScore": 4,
  "uncertainty_percent": 50
}

Improving Confidence

1. Provider Data

If a provider publishes emission data: - Update factor with new values - Set source: "measured" - Increase confidence to 0.8+

2. Academic Research

When new research available: - Validate against existing factors - Update if significant difference - Document source

3. Direct Measurement

For dedicated deployments: - Measure actual power consumption - Apply real grid carbon - Set confidence to 0.9+

Current Status

Most AI providers don't publish per-request emissions:

Provider Data Available Typical Confidence
Anthropic No 0.25–0.35
OpenAI No 0.25–0.35
Google Partial 0.35–0.45
Others No 0.15–0.25

Confidence Philosophy

Conservative by Default

When uncertain, we overestimate emissions: - Larger model size assumptions - Higher energy per token - Average (not clean) grid carbon

This ensures reported emissions are upper bounds.

Transparent Uncertainty

Users always know data quality: - Confidence displayed prominently - Uncertainty ranges in exports - Methodology documentation

Continuous Improvement

Track confidence over time: - Annual factor review - Provider engagement - Research monitoring