Layer 1 · Entry
Every call lands here. Per-call attribution from the first call. Complexity-aware routing picks the cheapest model that clears the quality bar. Policy-as-code is evaluated on every request, before the call leaves the runtime.
What the Gateway does
Route
Cheapest sufficient model, every call.
Meter
Per-agent, per-model, per-team attribution.
Govern
Policy-as-code on every request.
Observe
Telemetry the CFO can read.
Verify
Cross-model fidelity per transfer.
The Wedge
AI is the fastest-growing unattributed line on the cloud bill. No decomposition by agent, team, model, or call. Every query hits frontier pricing regardless of complexity. Provider logs locked to one vendor.
Per-call cost attribution at the wedge. The Gateway turns the AI cost center into a line item that decomposes by agent, team, model, and call.
Unit Economics
ARX cascade routing delivers 61.1 percent on its own. Semantic cache adds 19.8 percent on a different mechanism class. Combined gateway reduction is 80.6 percent on a 1M-call workload at 1,200-in / 400-out median tokens.
| Mechanism | What it does | % saved |
|---|---|---|
| Semantic cache | Repeat or near-identical calls served from cache, not the vendor. | 19.8% |
| Cost-quality routing | Each query goes to the cheapest model that clears the quality bar. | 61.1% |
| Total | Total gateway reduction. | 80.6% |
Cache hit rate is workload-dependent: FAQ-style 30 to 40 percent, research-assistant 5 to 10 percent. The 20 percent assumption used here is mid-range. Cascade distribution is calibrated per customer inside the pilot.
Cost-Quality Routing
Roughly 80 percent of enterprise calls hit a frontier model that costs an order of magnitude more than a purpose-built small model handles. The Gateway sends each query to the right model at the right cost, with the quality floor enforced per call.
Every query is scored against a complexity bar before it leaves the gateway. Routine calls go to purpose-built small models. Frontier models earn their cost only when complexity demands it.
Downcasting is gated by an explicit quality floor. The router never trades quality for cost without auditable policy. The floor is set per customer, per workload, per regulator.
Anthropic, OpenAI, Google Vertex, Mistral, Groq, and customer-hosted small models on the same routing fabric. Switch models, keep the policy.
Pilot Path
01
Drop the SDK and gateway endpoint into the existing stack. Per-call attribution turns on. The first cost decomposition surfaces in the first sessions of traffic.
02
Tag schema mapped to your finance dimensions. The CFO dashboard goes live. The first cost-attribution report runs.
03
Cost-quality routing turns on. Same telemetry feeds the audit chain. The pilot becomes the platform.
We come with the workload envelope and the calibration plan. You come with the call mix and the regulatory posture.
Layer 1.
The entry to the runtime.