Pricing

Two ways to pay. Both honest.

Premium speed — 1000+ tok/s on reserved capacity, priced per workload. Standard — public per-token rates on every model in the catalog, 70 tok/s baseline, self-serve. No minimums, no commits, no overage charges either way. $5 in starter credits to try Standard, no card.

Premium speed

1000+ tok/s. Same models.

The 14× throughput that earns the homepage. Reserved capacity on AWS Trainium and NVIDIA Blackwell, dedicated auto-scaling, contractual TTFT. Priced per workload, not per token — we right-size the cluster to what your traffic actually looks like.

Throughput: 1000+ tok/s
TTFT: < 250ms
Capacity: Reserved
Cluster: Single-model

Talk to sales

Standard tier · 70 tok/s

Per-token rates.

Every model in the catalog, published rate per million tokens. Same API as Premium — just without the reserved capacity. Self-serve, no contract.

Full model registry

Model	$/1M input	$/1M cached	$/1M output
GPT-OSS 120B OpenAI · 120B (MoE, ~5B active)	Contact sales
DeepSeek V4 Pro DeepSeek · Frontier MoE	Contact sales
DeepSeek V4 Flash DeepSeek · Mid-tier MoE	Contact sales
Kimi K2.6 Moonshot · 1T MoE (~32B active)	$0.68	$0.144	$3.41
Qwen3 235B Alibaba · 235B MoE (~22B active)	Contact sales
Kimi K2.6 Fast Moonshot · 1T MoE (~32B active)	Contact sales

Calculator

Math we don't hide.

Plug in monthly volume to see what your bill actually looks like. Type any number — the slider scales exponentially up to 100B tokens.

Model

Input tokens / month

1M100M10B100B

Output tokens / month

1M100M10B100B

Cached input share0% of input tokens repeat (billed at $0.144 / 1M)

Enterprise deployment

Your VPC. Our cluster. One bill.

Single-tenant deployment on AWS Trainium and NVIDIA Blackwell. Zero retention. SOC 2 Type II via Decart AI. VPC peering, contractual P99 SLAs, BAA on the roadmap. The full pitch lives at /enterprise.

▸Single-tenant on Trainium / Blackwell
▸SOC 2 Type II (via Decart AI)
▸GDPR + EU data residency
▸VPC peering, zero public egress
▸P99 latency SLA with credits

Talk to sales

How billing works

Math we don't hide.

Each rule is tagged with the tier it applies to. Standard is per-token. Premium is per workload. Anything tagged Both holds either way.

Standard: tokens, not minutes.: We bill prompt + completion tokens reported by the upstream. No charge for capacity sitting idle between requests.
Premium: priced per workload.: Reserved capacity, sized to your traffic. One predictable monthly figure — no per-request surprises, but also not per-token economics.
Standard: cached input is cheaper.: When the upstream KV cache hits, those input tokens bill at the per-model cache rate (visible in the table above).
Fine-tunes price the same.: Your weights, our serving. No surcharge for the privilege — same per-token rate (Standard) or same cluster shape (Premium).
Hard spend caps. Per org, per key.: Set a ceiling once. We stop accepting requests when you hit it and email you. Standard caps are dollar-denominated; Premium caps are RPM-denominated.
Credit doesn't expire.: $1 of credit is $1 of usage — for the life of the account. No usage tiers, no bonus credit that vanishes.