Pricing
Two ways to pay. Both honest.
Premium speed — 1000+ tok/s on reserved capacity, priced per workload. Standard — public per-token rates on every model in the catalog, 70 tok/s baseline, self-serve. No minimums, no commits, no overage charges either way. $5 in starter credits to try Standard, no card.
Premium speed
1000+ tok/s. Same models.
The 14× throughput that earns the homepage. Reserved capacity on AWS Trainium and NVIDIA Blackwell, dedicated auto-scaling, contractual TTFT. Priced per workload, not per token — we right-size the cluster to what your traffic actually looks like.
- Throughput
- 1000+ tok/s
- TTFT
- < 250ms
- Capacity
- Reserved
- Cluster
- Single-model
Standard tier · 70 tok/s
Per-token rates.
Every model in the catalog, published rate per million tokens. Same API as Premium — just without the reserved capacity. Self-serve, no contract.
| Model | $/1M input | $/1M cached | $/1M output |
|---|---|---|---|
GPT-OSS 120B OpenAI · 120B (MoE, ~5B active) | Contact sales | ||
DeepSeek V4 Pro DeepSeek · Frontier MoE | Contact sales | ||
DeepSeek V4 Flash DeepSeek · Mid-tier MoE | Contact sales | ||
Kimi K2.6 Moonshot · 1T MoE (~32B active) | $0.68 | $0.144 | $3.41 |
Qwen3 235B Alibaba · 235B MoE (~22B active) | Contact sales | ||
Kimi K2.6 Fast Moonshot · 1T MoE (~32B active) | Contact sales | ||
Calculator
Math we don't hide.
Plug in monthly volume to see what your bill actually looks like. Type any number — the slider scales exponentially up to 100B tokens.
Enterprise deployment
Your VPC. Our cluster. One bill.
Single-tenant deployment on AWS Trainium and NVIDIA Blackwell. Zero retention. SOC 2 Type II via Decart AI. VPC peering, contractual P99 SLAs, BAA on the roadmap. The full pitch lives at /enterprise.
- ▸Single-tenant on Trainium / Blackwell
- ▸SOC 2 Type II (via Decart AI)
- ▸GDPR + EU data residency
- ▸VPC peering, zero public egress
- ▸P99 latency SLA with credits
How billing works
Math we don't hide.
Each rule is tagged with the tier it applies to. Standard is per-token. Premium is per workload. Anything tagged Both holds either way.
- Standard: tokens, not minutes.
- We bill prompt + completion tokens reported by the upstream. No charge for capacity sitting idle between requests.
- Premium: priced per workload.
- Reserved capacity, sized to your traffic. One predictable monthly figure — no per-request surprises, but also not per-token economics.
- Standard: cached input is cheaper.
- When the upstream KV cache hits, those input tokens bill at the per-model cache rate (visible in the table above).
- Fine-tunes price the same.
- Your weights, our serving. No surcharge for the privilege — same per-token rate (Standard) or same cluster shape (Premium).
- Hard spend caps. Per org, per key.
- Set a ceiling once. We stop accepting requests when you hit it and email you. Standard caps are dollar-denominated; Premium caps are RPM-denominated.
- Credit doesn't expire.
- $1 of credit is $1 of usage — for the life of the account. No usage tiers, no bonus credit that vanishes.