ARX
Home
Solutions
Contact
Sign InConnect with the team
ARX

The control plane for enterprise AI.

330 E Liberty, Lower Level
Ann Arbor, MI 48104
(302) 450-5664
Cloudflare Startup Program
Lambda Cloud Startup Program
NVIDIA Inception Program Member
Ann Arbor SPARK
Connect with the team

Product

  • Overview
  • Gateway
  • Agent Exchange
  • Control Plane
  • Pricing
  • Security

Company

  • About
  • Team
  • Blog
  • Intelligence
  • Contact
  • Support

Legal

  • Terms
  • Privacy
  • Cookies
  • EULA
  • Acceptable Use
  • DPA
  • Do Not Sell
  • Accessibility

© 2026 ARX QM Holdings, Inc. All rights reserved.

·Patent Pending
All systems operational
·The AI Gateway for Multi-Agent Enterprise.Observe, orient, decide, act, on one runtime.Per-call attribution. Cost-quality routing. Audit by execution.Audit evidence as a byproduct of the runtime executing.One gateway. Every model. Every agent.

Layer 1 · Entry

The front door for
every agent call.

Every call lands here. Per-call attribution from the first call. Complexity-aware routing picks the cheapest model that clears the quality bar. Policy-as-code is evaluated on every request, before the call leaves the runtime.

Contact salesSee the architecture

What the Gateway does

Five functions, one entry layer.

Route

Cheapest sufficient model, every call.

Meter

Per-agent, per-model, per-team attribution.

Govern

Policy-as-code on every request.

Observe

Telemetry the CFO can read.

Verify

Cross-model fidelity per transfer.

The Wedge

The CFO already knows the line item is broken.

AI is the fastest-growing unattributed line on the cloud bill. No decomposition by agent, team, model, or call. Every query hits frontier pricing regardless of complexity. Provider logs locked to one vendor.

Per-call cost attribution at the wedge. The Gateway turns the AI cost center into a line item that decomposes by agent, team, model, and call.

Unit Economics

How the gateway cuts inference.

ARX cascade routing delivers 61.1 percent on its own. Semantic cache adds 19.8 percent on a different mechanism class. Combined gateway reduction is 80.6 percent on a 1M-call workload at 1,200-in / 400-out median tokens.

MechanismWhat it does% saved
Semantic cacheRepeat or near-identical calls served from cache, not the vendor.19.8%
Cost-quality routingEach query goes to the cheapest model that clears the quality bar.61.1%
TotalTotal gateway reduction.80.6%

Cache hit rate is workload-dependent: FAQ-style 30 to 40 percent, research-assistant 5 to 10 percent. The 20 percent assumption used here is mid-range. Cascade distribution is calibrated per customer inside the pilot.

Cost-Quality Routing

The cheapest sufficient model, every time.

Roughly 80 percent of enterprise calls hit a frontier model that costs an order of magnitude more than a purpose-built small model handles. The Gateway sends each query to the right model at the right cost, with the quality floor enforced per call.

Complexity scoring

Every query is scored against a complexity bar before it leaves the gateway. Routine calls go to purpose-built small models. Frontier models earn their cost only when complexity demands it.

Quality floor

Downcasting is gated by an explicit quality floor. The router never trades quality for cost without auditable policy. The floor is set per customer, per workload, per regulator.

Provider-agnostic

Anthropic, OpenAI, Google Vertex, Mistral, Groq, and customer-hosted small models on the same routing fabric. Switch models, keep the policy.

Pilot Path

Three steps from drop-in to platform.

01

Instrument

Drop the SDK and gateway endpoint into the existing stack. Per-call attribution turns on. The first cost decomposition surfaces in the first sessions of traffic.

02

Calibrate

Tag schema mapped to your finance dimensions. The CFO dashboard goes live. The first cost-attribution report runs.

03

Expand

Cost-quality routing turns on. Same telemetry feeds the audit chain. The pilot becomes the platform.

The Gateway is the front door. Walk us through your stack.

We come with the workload envelope and the calibration plan. You come with the call mix and the regulatory posture.

Contact salesNext: Agent Exchange

Layer 1.
The entry to the runtime.