Internal · Engineering · March 2026

Data Intelligence & Observability Report

Turning CO₂Router carbon routing data into a competitive moat. 35+ Prisma models, 11 migrations, 1,345 lines of schema definition.

1. Executive Summary

CO₂Router generates rich, structured data across 35+ Prisma models covering carbon routing decisions, grid signals, provider performance, water impact, and organizational billing. The system currently captures decision-grade data at every layer: every routing request generates a CarbonCommand record with full scoring trace, a CarbonLedgerEntry for audit-grade carbon accounting, ProviderSnapshot records preserving signal state, and GridSignalSnapshot records documenting real-time grid conditions. EIA-930 raw data is ingested every 15 minutes across balance, interchange, and subregion feeds.

Over time, as decision volume grows, this data becomes the foundation for ML models (carbon intensity forecasting, curtailment prediction), customer ROI dashboards, regulatory compliance reporting (EU CSRD, SEC climate disclosure), and marketing case studies with verified carbon savings figures.

2. Data Inventory

2.1 Decision Data (Highest Analytics Value)

Every routing decision generates interconnected records capturing the full decision lifecycle from request through outcome measurement:

Model	Purpose	Key Fields
`DecisionLog`	Core audit table — append-only	Full CiResponseV2 output, proofHash, orgId, leaseExpiry
`CarbonCommand`	Per-request record with scoring trace	22 required fields, grid signal enrichment (nullable)
`CarbonLedgerEntry`	Audit-grade carbon accounting	baselineRegion through qualityTier — all populated at write time
`DecisionTraceEnvelope`	Full signal state at decision time	Provider health, cache status, lineage, disagreement flags

2.2 Grid Signal Data

Model	Source	Frequency
`GridSignalSnapshot`	WattTime / Electricity Maps / Ember	Per decision + background 15min
`ProviderSnapshot`	All active providers	Per decision
`EIA930RawRecord`	EIA.gov BALANCE/INTERCHANGE/SUBREGION	Every 15 minutes
`RegionStructuralProfile`	Ember — validation only	Monthly refresh

2.3 Water Impact Data

Water data captures facility-level water consumption and stress metrics across multiple climate scenarios. This is a novel dataset not available from any competitor. Sources: Aqueduct 2.1, AWARE 2.0, WWF Water Risk Filter. Current status: degraded — 27hr stale. Water fields are NULL for approximately 40% of regions due to limited facility telemetry publication.

2.4 Intelligence + Adaptive Learning

Model	Purpose	Threshold
`AdaptiveProfile`	Per-region learned carbon patterns	500+ decisions to reach statistical significance
`WorkloadEmbedding`	Similarity matching (max 1 per command)	Influences confidence only — never carbon score
`ForecastRefresh`	Forecast accuracy tracking	Requires outcome recording active
`CarbonCommandOutcome`	Realized vs predicted accuracy	Advisory-mode commands create blind spot

3. Data Quality Assessment

3.1 Completeness

Every CarbonCommand record includes 22 required fields populated at creation time. Grid signal fields (demandRampPct, carbonSpikeProbability, curtailmentProbability, importCarbonLeakageScore) are nullable by design — NULL when the cache hierarchy misses and no recent EIA-930 data is available for the balancing authority.

Expected NULL rates in production:

US regions: <15% (strong EIA-930 coverage)
EU/APAC regions: 30–50% (EIA coverage is US-only; relies on Electricity Maps enrichment)

3.2 Provider Disagreement Rate

Provider disagreement is a first-class metric. CO₂Router detects when WattTime and Electricity Maps diverge by more than 20% for the same zone and timeframe. Target detection rate: ≥95%. Empirical rates: 8–15% for US regions, 15–25% for international regions (marginal vs flow-traced methodology difference).

3.3 Quality Tier Distribution (Healthy System)

HIGH tier:    ~70%   (live signal, confidence ≥ 0.80)
MEDIUM tier:  ~20%   (warm cache, confidence 0.50–0.79)
LOW tier:     ~10%   (fallback conditions, confidence < 0.50)

// Alert trigger: LOW > 20% indicates provider degradation or cache warming failure

4. Observability Architecture

4.1 Core Metrics to Implement

co2router.routing.decision.latency.ms
  // P50, P95, P99 end-to-end routing decision time
  // Alert: P99 > 100ms (quality gate), P99 > 200ms (SLA breach)

co2router.cache.hit.rate
  // Cache hit % per region per 5-min window
  // Target: > 95% during normal operation

co2router.grid.signal.freshness.seconds
  // Age of most recent GridSignalSnapshot per region
  // Alert: any region > 30 minutes (ingestion failure)

co2router.provider.disagreement.rate
  // % of decisions with disagreement flag
  // Sudden spikes = provider data quality issue

co2router.carbon.saved.kg
  // Cumulative carbon savings — monotonically increasing
  // Resets monthly — drives hero KPI on dashboard

co2router.quality.tier.distribution
  // % decisions per tier (HIGH/MEDIUM/LOW)
  // Alert: LOW > 20%

co2router.forecast.accuracy.pct
  // Rolling accuracy: predicted vs realized carbon intensity
  // Per-provider and per-region

co2router.circuit.breaker.state
  // Current CB state per provider (0=CLOSED, 1=HALF_OPEN, 2=OPEN)
  // Alert on any OPEN state

4.2 Log Correlation Model

Every log entry should include a correlation ID: decisionFrameId for routing logs, requestId for HTTP logs. This enables full request tracing through log search. The engine uses structured JSON logging via src/lib/logging/logger.ts.

5. The Proprietary Moat

CO₂Router's decision log is a dataset that does not exist anywhere else: structured, cryptographically signed records of real workload authorization decisions with full signal provenance, provider health state, and carbon accounting. Every provider (WattTime, Electricity Maps, Ember) publishes their signal. CO₂Router owns the decisions made on those signals — and the verified outcomes.

As decision volume grows, this dataset becomes the foundation for:

Carbon intensity forecasting models (training on proprietary decision outcomes)
Curtailment prediction — beyond EIA-930 with decision-level ground truth
Customer ROI reporting with auditable, replayed verification
Regulatory compliance evidence (EU CSRD, SEC climate disclosure) with cryptographic proof

6. Known Data Limitations

Limitation	Impact	Planned Fix
No real-time streaming	Signal age floor is 5–15 minutes (poll interval)	WebSocket streaming — Q3 2026
Water data 40% NULL	Water constraint unenforced for many regions	Expand facility data partnerships
Adaptive learning requires 500+ decisions	Wide confidence intervals early	Expected to narrow at month 3+
Advisory mode = no outcome recording	Forecast accuracy blind spot for advisory commands	Add outcome estimation model