Data Intelligence & Observability Report
Turning CO₂Router carbon routing data into a competitive moat. 35+ Prisma models, 11 migrations, 1,345 lines of schema definition.
1. Executive Summary
CO₂Router generates rich, structured data across 35+ Prisma models covering carbon routing decisions, grid signals, provider performance, water impact, and organizational billing. The system currently captures decision-grade data at every layer: every routing request generates a CarbonCommand record with full scoring trace, a CarbonLedgerEntry for audit-grade carbon accounting, ProviderSnapshot records preserving signal state, and GridSignalSnapshot records documenting real-time grid conditions. EIA-930 raw data is ingested every 15 minutes across balance, interchange, and subregion feeds.
Over time, as decision volume grows, this data becomes the foundation for ML models (carbon intensity forecasting, curtailment prediction), customer ROI dashboards, regulatory compliance reporting (EU CSRD, SEC climate disclosure), and marketing case studies with verified carbon savings figures.
2. Data Inventory
2.1 Decision Data (Highest Analytics Value)
Every routing decision generates interconnected records capturing the full decision lifecycle from request through outcome measurement:
| Model | Purpose | Key Fields |
|---|---|---|
DecisionLog | Core audit table — append-only | Full CiResponseV2 output, proofHash, orgId, leaseExpiry |
CarbonCommand | Per-request record with scoring trace | 22 required fields, grid signal enrichment (nullable) |
CarbonLedgerEntry | Audit-grade carbon accounting | baselineRegion through qualityTier — all populated at write time |
DecisionTraceEnvelope | Full signal state at decision time | Provider health, cache status, lineage, disagreement flags |
2.2 Grid Signal Data
| Model | Source | Frequency |
|---|---|---|
GridSignalSnapshot | WattTime / Electricity Maps / Ember | Per decision + background 15min |
ProviderSnapshot | All active providers | Per decision |
EIA930RawRecord | EIA.gov BALANCE/INTERCHANGE/SUBREGION | Every 15 minutes |
RegionStructuralProfile | Ember — validation only | Monthly refresh |
2.3 Water Impact Data
Water data captures facility-level water consumption and stress metrics across multiple climate scenarios. This is a novel dataset not available from any competitor. Sources: Aqueduct 2.1, AWARE 2.0, WWF Water Risk Filter. Current status: degraded — 27hr stale. Water fields are NULL for approximately 40% of regions due to limited facility telemetry publication.
2.4 Intelligence + Adaptive Learning
| Model | Purpose | Threshold |
|---|---|---|
AdaptiveProfile | Per-region learned carbon patterns | 500+ decisions to reach statistical significance |
WorkloadEmbedding | Similarity matching (max 1 per command) | Influences confidence only — never carbon score |
ForecastRefresh | Forecast accuracy tracking | Requires outcome recording active |
CarbonCommandOutcome | Realized vs predicted accuracy | Advisory-mode commands create blind spot |
3. Data Quality Assessment
3.1 Completeness
Every CarbonCommand record includes 22 required fields populated at creation time. Grid signal fields (demandRampPct, carbonSpikeProbability, curtailmentProbability, importCarbonLeakageScore) are nullable by design — NULL when the cache hierarchy misses and no recent EIA-930 data is available for the balancing authority.
Expected NULL rates in production:
- US regions: <15% (strong EIA-930 coverage)
- EU/APAC regions: 30–50% (EIA coverage is US-only; relies on Electricity Maps enrichment)
3.2 Provider Disagreement Rate
Provider disagreement is a first-class metric. CO₂Router detects when WattTime and Electricity Maps diverge by more than 20% for the same zone and timeframe. Target detection rate: ≥95%. Empirical rates: 8–15% for US regions, 15–25% for international regions (marginal vs flow-traced methodology difference).
3.3 Quality Tier Distribution (Healthy System)
HIGH tier: ~70% (live signal, confidence ≥ 0.80) MEDIUM tier: ~20% (warm cache, confidence 0.50–0.79) LOW tier: ~10% (fallback conditions, confidence < 0.50) // Alert trigger: LOW > 20% indicates provider degradation or cache warming failure
4. Observability Architecture
4.1 Core Metrics to Implement
co2router.routing.decision.latency.ms // P50, P95, P99 end-to-end routing decision time // Alert: P99 > 100ms (quality gate), P99 > 200ms (SLA breach) co2router.cache.hit.rate // Cache hit % per region per 5-min window // Target: > 95% during normal operation co2router.grid.signal.freshness.seconds // Age of most recent GridSignalSnapshot per region // Alert: any region > 30 minutes (ingestion failure) co2router.provider.disagreement.rate // % of decisions with disagreement flag // Sudden spikes = provider data quality issue co2router.carbon.saved.kg // Cumulative carbon savings — monotonically increasing // Resets monthly — drives hero KPI on dashboard co2router.quality.tier.distribution // % decisions per tier (HIGH/MEDIUM/LOW) // Alert: LOW > 20% co2router.forecast.accuracy.pct // Rolling accuracy: predicted vs realized carbon intensity // Per-provider and per-region co2router.circuit.breaker.state // Current CB state per provider (0=CLOSED, 1=HALF_OPEN, 2=OPEN) // Alert on any OPEN state
4.2 Log Correlation Model
Every log entry should include a correlation ID: decisionFrameId for routing logs, requestId for HTTP logs. This enables full request tracing through log search. The engine uses structured JSON logging via src/lib/logging/logger.ts.
5. The Proprietary Moat
CO₂Router's decision log is a dataset that does not exist anywhere else: structured, cryptographically signed records of real workload authorization decisions with full signal provenance, provider health state, and carbon accounting. Every provider (WattTime, Electricity Maps, Ember) publishes their signal. CO₂Router owns the decisions made on those signals — and the verified outcomes.
As decision volume grows, this dataset becomes the foundation for:
- Carbon intensity forecasting models (training on proprietary decision outcomes)
- Curtailment prediction — beyond EIA-930 with decision-level ground truth
- Customer ROI reporting with auditable, replayed verification
- Regulatory compliance evidence (EU CSRD, SEC climate disclosure) with cryptographic proof
6. Known Data Limitations
| Limitation | Impact | Planned Fix |
|---|---|---|
| No real-time streaming | Signal age floor is 5–15 minutes (poll interval) | WebSocket streaming — Q3 2026 |
| Water data 40% NULL | Water constraint unenforced for many regions | Expand facility data partnerships |
| Adaptive learning requires 500+ decisions | Wide confidence intervals early | Expected to narrow at month 3+ |
| Advisory mode = no outcome recording | Forecast accuracy blind spot for advisory commands | Add outcome estimation model |