All models

DeepSeek

DeepSeek V4 Flash

The cheap workhorse with a 1M-token window. Built for high-volume pipelines where the bill matters as much as the answer.

Mid-tier MoEDeepSeek License
Context

1,000k

Tokens / sec

70

TTFT

220ms

Hardware

AWS Trainium

Pricing

Input
$0.14 / 1M tokens
Output
$0.28 / 1M tokens
Context cache
50% of input rate, automatic
Fine-tunes
Same per-token price as base

Capabilities

  • SSE streaming
  • Tool / function calling
  • Structured JSON outputs

Use cases

High-volume RAGLong-context summarizationSynthetic data

Quickstart

Full quickstart
deepseek-v4-flash.py
from openai import OpenAI client = OpenAI( base_url="https://api.cogito.decart.ai/v1", api_key=os.environ["COGITO_API_KEY"], ) response = client.chat.completions.create( model="deepseek-v4-flash", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content)