v2.4 · Adaptive routing is live read changelog

One endpoint.
Every model.
Zero rewrites.

TokenAIRouter is the enterprise router for LLMs — one OpenAI-compatible API that routes intelligently across 320+ models from 60+ providers, with built-in failover, cost ceilings, and BYOK on every call.

No card · 500k tokens credit
chat.ts curl python
TypeScript · OpenAI SDK
1import OpenAI from "openai";
2
3const ai = new OpenAI({
4  baseURL: "https://api.tokenairouter.com/v1",
5  apiKey: process.env.TAR_KEY,
6});
7
8// route by intent — TAR picks the cheapest model
9// that meets your latency & quality SLO.
10const res = await ai.chat.completions.create({
11  model: "router/balanced",  // or "claude-sonnet-4.5"
12  messages: [{ role: "user", content: prompt }],
13  extra: { max_cost_usd: 0.02, failover: "auto" },
14});
routed in 184ms · cache hit · 41% cheaper than direct claude-sonnet-4.5
42.1B
Tokens routed / 24h ↑ 12.4%
328
Models available
62
Upstream providers
99.98%
Routed-call uptime · 90d
Routes traffic to the providers your stack already trusts
OpenAI Anthropic Google Vertex AWS Bedrock Azure OpenAI Mistral Groq Fireworks Together DeepSeek Qwen xAI
01 — Featured models

The frontier, unified behind one key.

Live pricing, capability tags and median latency, refreshed every five minutes from every upstream.

Browse all 328 models
★ Most routed
Claude Sonnet 4.5
Anthropic · sonnet-4.5

Best-in-class reasoning at production cost. Excels at coding, agentic tool use and long-context analysis.

Context
200K
In · $/M
$3.00
Out · $/M
$15.00
vision tools caching JSON
p50 642ms
99.99%
GPT-5
OpenAI · gpt-5

Flagship general-purpose model. Strong on multimodal reasoning, structured outputs, real-time voice.

Context
400K
In · $/M
$2.50
Out · $/M
$10.00
vision audio tools realtime
p50 488ms
99.97%
↑ Trending
Gemini 2.5 Pro
Google · gemini-2.5-pro

Two-million-token context with native video. Best price per token in the frontier tier.

Context
2M
In · $/M
$1.25
Out · $/M
$5.00
vision video tools 2M ctx
p50 712ms
99.95%
DeepSeek V3.2
DeepSeek · deepseek-v3.2

Open-weight MoE; near-frontier quality on code & math at 1/20th the cost. Great for batch workloads.

Context
128K
In · $/M
$0.14
Out · $/M
$0.28
low-cost tools open
p50 904ms
99.94%
Qwen3 235B Instruct
Alibaba · qwen3-235b

Open-weight flagship. Native bilingual zh/en, strong tool use, deployable on-prem if you bring your weights.

Context
131K
In · $/M
$0.45
Out · $/M
$1.80
bilingual tools open on-prem
p50 798ms
99.92%
⚡ Fastest
Grok 4 · Fast
xAI · grok-4-fast

Latency-optimised SKU on Groq silicon. Sub-second p99 for agentic loops and voice.

Context
256K
In · $/M
$0.80
Out · $/M
$2.40
fast tools streaming
p50 142ms
99.98%
02 — How routing works

A control plane in front of every model you use.

One request, scored against your policy in <5ms — then dispatched to the upstream that wins on price × latency × quality, with hot-spare failover if it doesn't.

01
Policy & budget aware
Set hard ceilings per call, per user or per workspace. Block PII egress, block specific regions, block non-FIPS models.
02
Real-time failover
If an upstream degrades, we re-route mid-stream to a hot spare without your client knowing.
03
Semantic + prefix cache
Edge-cached completions and prompt-prefix reuse — average savings of 38% on recurring traffic.
04
BYOK on every provider
Pass through your own provider keys for sovereign billing — TAR only sees the routing decision, not your spend.
CLIENT your.app OpenAI SDK ROUTER TokenAIRouter policy · cache · failover POST OpenAI Anthropic Bedrock ✓ Vertex Groq DeepSeek ROUTING DECISION · 4ms cost −41% p50 184ms quality A+ cache HIT
03 — Top apps

Built on the router this week.

Ranked by tokens routed in the last 7 days across all our public-tier customers.

Full leaderboard
01
L
Lattice — agent IDE
Autonomous coding agent for monorepos · 1,240 orgs
8.42Btokens / 7d
02
F
Fieldbook AI — sales copilot
Real-time call summarisation & CRM enrichment
5.18Btokens / 7d
03
P
Penumbra — LLM observability
Trace, replay and benchmark across every model
3.96Btokens / 7d
04
H
Halcyon — note-to-knowledge
Long-context personal knowledge synthesis
2.71Btokens / 7d
05
B
Brio — support copilot
L1 ticket triage & deflection for SaaS
2.40Btokens / 7d
06
M
Meridian — research agent
Multi-step deep research with citations
1.92Btokens / 7d
04 — Enterprise

White-glove routing for regulated teams.

Dedicated capacity in your region, SAML & SCIM, signed BAAs, audit-log exports, model-allowlists by policy, and a named engineer on Slack. We meet your compliance team where they are.

SOC 2 II
Type II report under audit; bridge letter on request
HIPAA
BAA signed by default for healthcare workspaces
EU · US · APAC
Data-residency routing with no cross-region egress
<5ms
Median overhead added by the router itself