live ttft228ms

Cogito,
ergo ship.

Open-source LLMs and your fine-tunes — served on AWS Trainium and NVIDIA GPUs at frontier speed. OpenAI-compatible from line one.

Start building Talk to engineering

Drop-in OpenAI compatibleNo cold startsPay per token, not idle

stream.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cogito.decart.ai/v1",
    api_key=os.environ["COGITO_API_KEY"],
)

stream = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

The first six teams to ship on Cogito

Reserved for early-access partners. Your logo could be one of them.

01 · open

02 · open

03 · open

04 · open

05 · open

06 · open

Fast inference. No asterisks.

Built like infrastructure should be.

Three things we obsess over so your application doesn't have to.

Trainium + GPU, routed for you

We pick the right silicon for each model: AWS Trainium for cost-efficient throughput, NVIDIA H100 / H200 for the heaviest configurations. You just call the API.

Drop-in OpenAI compatible

Swap base_url and api_key. Streaming SSE, function calling, structured JSON outputs — all match the OpenAI spec. No client rewrite, no surprises.

No asterisks

Token-aware rate limits, deterministic error schemas with request IDs, zero retention by default, hard spend caps. The infrastructure niceties incumbents don't bother with.

Catalog

Frontier open weights, day one.

Llama, Qwen, DeepSeek, Mistral. Hot the moment they ship. Same per-token price for your fine-tuned variants.

Browse all models

OpenAI

GPT-OSS 120B

120B (MoE, ~5B active)

OpenAI's first open-weight model since GPT-2. Mixture-of-experts 120B with strong general reasoning at a price that's hard to beat.

Context: 131.072k
TPS: 70
Input: $0.04

on AWS Trainium

DeepSeek

DeepSeek V4 Pro

Frontier MoE

DeepSeek's frontier model. 1M-token context, frontier-class reasoning, and a price tag that makes proprietary alternatives hard to justify.

Context: 1,000k
TPS: 70
Input: $0.43

on NVIDIA H200

DeepSeek

DeepSeek V4 Flash

Mid-tier MoE

The cheap workhorse with a 1M-token window. Built for high-volume pipelines where the bill matters as much as the answer.

Context: 1,000k
TPS: 70
Input: $0.14

on AWS Trainium

Moonshot

Kimi K2.6

1T MoE (~32B active)

Moonshot's flagship MoE. Trained for long-horizon agentic workflows; the model engineering teams reach for when the cheap models stop being enough.

Context: 262.144k
TPS: 70
Input: $0.74

on NVIDIA H100

DeepSeek

DeepSeek V3.2

671B MoE (~37B active)

The proven workhorse. DeepSeek V3.2's reasoning is more than enough for 90% of production workloads, at one of the lowest prices in the catalog.

Context: 131.072k
TPS: 70
Input: $0.25

on AWS Trainium

Enterprise

Inference your security team can sign off on.

SOC 2 Type II certified through Decart AI, GDPR-compliant by design, single-tenant VPC deployments, contractual P99 SLAs, zero retention by default. Open-source models without the open-source compliance gap.

Read the trust story Talk to engineering

SOC 2 Type II
Certified
GDPR
Compliant
HIPAA BAA
In progress

Ship something today.

$5 in free credits. No credit card. Five minutes from sign-up to your first streamed response.

Start building Read the docs

Cogito,ergo ship.

Built like infrastructure should be.

Trainium + GPU, routed for you

Drop-in OpenAI compatible

No asterisks

Frontier open weights, day one.

GPT-OSS 120B

DeepSeek V4 Pro

DeepSeek V4 Flash

Kimi K2.6

DeepSeek V3.2

Inference your security team can sign off on.

Ship something today.

Cogito,
ergo ship.