DeepSeek

DeepSeek V4 Flash

The cheap workhorse with a 1M-token window. Built for high-volume pipelines where the bill matters as much as the answer.

Mid-tier MoEDeepSeek License

Context

1,000k

Tokens / sec

TTFT

220ms

Hardware

AWS Trainium

Pricing

Input: $0.14 / 1M tokens
Output: $0.28 / 1M tokens
Context cache: 50% of input rate, automatic
Fine-tunes: Same per-token price as base

Capabilities

SSE streaming
Tool / function calling
Structured JSON outputs

Use cases

High-volume RAGLong-context summarizationSynthetic data

Quickstart

Full quickstart

deepseek-v4-flash.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cogito.decart.ai/v1",
    api_key=os.environ["COGITO_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)