Pricing

Two ways to pay. Both honest.

Premium speed — 1000+ tok/s on reserved capacity, priced per workload. Standard — public per-token rates on every model in the catalog, 70 tok/s baseline, self-serve. No minimums, no commits, no overage charges either way. $5 in starter credits to try Standard, no card.

Premium speed

1000+ tok/s. Same models.

The 14× throughput that earns the homepage. Reserved capacity on AWS Trainium and NVIDIA Blackwell, dedicated auto-scaling, contractual TTFT. Priced per workload, not per token — we right-size the cluster to what your traffic actually looks like.

Throughput
1000+ tok/s
TTFT
< 250ms
Capacity
Reserved
Cluster
Single-model
Talk to sales

Standard tier · 70 tok/s

Per-token rates.

Every model in the catalog, published rate per million tokens. Same API as Premium — just without the reserved capacity. Self-serve, no contract.

Full model registry
Model$/1M input$/1M cached$/1M output

GPT-OSS 120B

OpenAI · 120B (MoE, ~5B active)

Contact sales

DeepSeek V4 Pro

DeepSeek · Frontier MoE

Contact sales

DeepSeek V4 Flash

DeepSeek · Mid-tier MoE

Contact sales

Kimi K2.6

Moonshot · 1T MoE (~32B active)

$0.68$0.144$3.41

Qwen3 235B

Alibaba · 235B MoE (~22B active)

Contact sales

Kimi K2.6 Fast

Moonshot · 1T MoE (~32B active)

Contact sales

Calculator

Math we don't hide.

Plug in monthly volume to see what your bill actually looks like. Type any number — the slider scales exponentially up to 100B tokens.

Model
Input tokens / month
1M100M10B100B
Output tokens / month
1M100M10B100B
Cached input share0% of input tokens repeat (billed at $0.144 / 1M)

Enterprise deployment

Your VPC. Our cluster. One bill.

Single-tenant deployment on AWS Trainium and NVIDIA Blackwell. Zero retention. SOC 2 Type II via Decart AI. VPC peering, contractual P99 SLAs, BAA on the roadmap. The full pitch lives at /enterprise.

  • Single-tenant on Trainium / Blackwell
  • SOC 2 Type II (via Decart AI)
  • GDPR + EU data residency
  • VPC peering, zero public egress
  • P99 latency SLA with credits
Talk to sales

How billing works

Math we don't hide.

Each rule is tagged with the tier it applies to. Standard is per-token. Premium is per workload. Anything tagged Both holds either way.

Standard
Standard: tokens, not minutes.
We bill prompt + completion tokens reported by the upstream. No charge for capacity sitting idle between requests.
Premium
Premium: priced per workload.
Reserved capacity, sized to your traffic. One predictable monthly figure — no per-request surprises, but also not per-token economics.
Standard
Standard: cached input is cheaper.
When the upstream KV cache hits, those input tokens bill at the per-model cache rate (visible in the table above).
Both tiers
Fine-tunes price the same.
Your weights, our serving. No surcharge for the privilege — same per-token rate (Standard) or same cluster shape (Premium).
Both tiers
Hard spend caps. Per org, per key.
Set a ceiling once. We stop accepting requests when you hit it and email you. Standard caps are dollar-denominated; Premium caps are RPM-denominated.
Both tiers
Credit doesn't expire.
$1 of credit is $1 of usage — for the life of the account. No usage tiers, no bonus credit that vanishes.