Cogito,
ergo ship.
Open-source LLMs and your fine-tunes — served on AWS Trainium and NVIDIA GPUs at frontier speed. OpenAI-compatible from line one.
from openai import OpenAI
client = OpenAI(
base_url="https://api.cogito.decart.ai/v1",
api_key=os.environ["COGITO_API_KEY"],
)
stream = client.chat.completions.create(
model="gpt-oss-120b",
messages=[{"role": "user", "content": "Why is the sky blue?"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")The first six teams to ship on Cogito
Reserved for early-access partners. Your logo could be one of them.
Fast inference. No asterisks.
Built like infrastructure should be.
Three things we obsess over so your application doesn't have to.
Trainium + GPU, routed for you
We pick the right silicon for each model: AWS Trainium for cost-efficient throughput, NVIDIA H100 / H200 for the heaviest configurations. You just call the API.
Drop-in OpenAI compatible
Swap base_url and api_key. Streaming SSE, function calling, structured JSON outputs — all match the OpenAI spec. No client rewrite, no surprises.
No asterisks
Token-aware rate limits, deterministic error schemas with request IDs, zero retention by default, hard spend caps. The infrastructure niceties incumbents don't bother with.
Catalog
Frontier open weights, day one.
Llama, Qwen, DeepSeek, Mistral. Hot the moment they ship. Same per-token price for your fine-tuned variants.
OpenAI
GPT-OSS 120B
OpenAI's first open-weight model since GPT-2. Mixture-of-experts 120B with strong general reasoning at a price that's hard to beat.
- Context
- 131.072k
- TPS
- 70
- Input
- $0.04
on AWS Trainium
DeepSeek
DeepSeek V4 Pro
DeepSeek's frontier model. 1M-token context, frontier-class reasoning, and a price tag that makes proprietary alternatives hard to justify.
- Context
- 1,000k
- TPS
- 70
- Input
- $0.43
on NVIDIA H200
DeepSeek
DeepSeek V4 Flash
The cheap workhorse with a 1M-token window. Built for high-volume pipelines where the bill matters as much as the answer.
- Context
- 1,000k
- TPS
- 70
- Input
- $0.14
on AWS Trainium
Moonshot
Kimi K2.6
Moonshot's flagship MoE. Trained for long-horizon agentic workflows; the model engineering teams reach for when the cheap models stop being enough.
- Context
- 262.144k
- TPS
- 70
- Input
- $0.74
on NVIDIA H100
DeepSeek
DeepSeek V3.2
The proven workhorse. DeepSeek V3.2's reasoning is more than enough for 90% of production workloads, at one of the lowest prices in the catalog.
- Context
- 131.072k
- TPS
- 70
- Input
- $0.25
on AWS Trainium
Enterprise
Inference your security team can sign off on.
SOC 2 Type II certified through Decart AI, GDPR-compliant by design, single-tenant VPC deployments, contractual P99 SLAs, zero retention by default. Open-source models without the open-source compliance gap.
- SOC 2 Type IICertified
- GDPRCompliant
- HIPAA BAAIn progress
Ship something today.
$5 in free credits. No credit card. Five minutes from sign-up to your first streamed response.