Skip to main content
← All models

DeepSeek V3

DeepSeek · deepseek family · Official docs

DeepSeek V3 is the price-performance disruptor of the frontier model landscape. At $0.28/1M input tokens with 90% cache-hit discounts, it undercuts nearly every competitor while delivering 671B-parameter MoE quality. The OpenAI-compatible API makes migration trivial. However, Refrase users should be aware of two significant caveats: (1) the data jurisdiction issue — all API traffic routes through mainland China, which may be a dealbreaker for regulated industries; and (2) the relatively low max output token limit (8K for chat mode) constrains long-form generation tasks. The architectural innovations (MLA, auxiliary-loss-free balancing, FP8 training) are genuinely novel and well-documented in the technical report. For cost-sensitive users who can work within the output constraints and data residency requirements, it is hard to beat.

Try Refrase on a DeepSeek V3 prompt

Paste any prompt — Refrase rewrites it using DeepSeek V3's documentation as context. 4–7 seconds end-to-end.

Specifications

128K
Context window
8K
Max output
$0.28 / $0.42
Per 1M tokens (in/out)
DeepSeek API pricing for deepseek-chat (V3.2 non-thinking). Cache hit: $0.028/1M input tokens (90% discount). Reasoner (thinking mode): same input pricing, output $0.42/1M for final + reasoning tokens. Extremely aggressive pricing — among the cheapest frontier-class models available. (source: DeepSeek API Docs, Models & Pricing page)

Strengths

analysiscode

Key capabilities

  • Mixture-of-Experts architecture: 671B total parameters, 37B activated per token, with 256 routed experts plus 1 shared expert (8 experts activated per token) (source: arXiv:2412.19437, DeepSeek-V3 Technical Report)
  • Multi-head Latent Attention (MLA): compresses KV cache for efficient long-context processing, validated in DeepSeek-V2 (source: arXiv:2412.19437, DeepSeek-V3 Technical Report)
  • Auxiliary-loss-free load balancing: pioneering strategy using learned bias terms per expert, avoiding quality degradation from traditional auxiliary losses (source: arXiv:2412.19437, DeepSeek-V3 Technical Report)
  • Multi-token prediction training objective for stronger downstream performance (source: arXiv:2412.19437, DeepSeek-V3 Technical Report)
  • OpenAI-compatible API format: drop-in replacement using OpenAI SDK with base_url change (source: DeepSeek API Docs, 'Your First API Call')
  • Context caching on disk: repeated long prefixes are processed faster and cheaper with 90% cache-hit discount (source: DeepSeek API Docs, Models & Pricing page)
  • V3.2 supports thinking mode (deepseek-reasoner) with tool-use integration — first model to integrate reasoning directly into tool calls (source: DeepSeek API Docs, 'DeepSeek-V3.2 Release')

Known limitations

  • Default max output is 4K tokens (expandable to 8K) for deepseek-chat; reasoner mode defaults to 32K (max 64K) — relatively limited compared to other frontier models (source: DeepSeek API Docs, Models & Pricing page)
  • Hallucination rate of approximately 3.9% on Vectara benchmark — lower than R1 but still measurable (source: Vectara Research, 'DeepSeek-R1 hallucinates more than DeepSeek-V3')
  • Safety alignment concerns: found to be less aligned than comparable models, with higher risk of producing harmful content and lower jailbreak resistance scores (source: Microsoft Research, safety benchmarking reports via TechTarget)
  • Content filtering reflects Chinese regulatory requirements: may refuse politically sensitive questions about China while answering analogous questions about other countries (source: Multiple independent tester reports, TechTarget)
  • All data processed through DeepSeek API is hosted on servers in mainland China, subject to Chinese legal jurisdiction (source: NordVPN security analysis, 'Is DeepSeek safe to use?')

How to prompt DeepSeek V3

Preferred instruction format

Standard OpenAI-compatible chat format with system/user/assistant roles. System prompt is strongly respected — use it to lock in behavior, role, and output format. JSON mode requires both response_format={'type': 'json_object'} AND mentioning 'JSON' in the prompt text.

Recommended practices

  • Use system prompt to define role and behavior — V3 responds well to clear, concise, consistent system messages (source: datastudios.org, 'DeepSeek Prompting Techniques')
  • Place static data (documentation, codebases) at the beginning of the prompt to leverage disk-based context caching for faster and cheaper processing (source: skywork.ai, 'Best Prompts for DeepSeek-V3.2-Exp')
  • Break prompts into digestible blocks: separate background info, task description, and constraints into distinct messages (source: skywork.ai, 'How to Optimize Prompts for DeepSeek-V3.2-Exp')
  • For JSON output, set response_format to json_object AND include the word 'JSON' in the prompt text (source: datastudios.org, 'DeepSeek Prompting Techniques')
  • Use few-shot examples and persona adoption for complex tasks — V3 excels at structured outputs and following complex system instructions (source: datastudios.org, 'DeepSeek Prompting Techniques')

Anti-patterns to avoid

  • Do not omit 'JSON' from prompt text when using json_object response format — model may not comply without explicit mention (source: datastudios.org, 'DeepSeek Prompting Techniques')
  • Avoid rewriting role instructions every turn — use a consistent system message and vary only user messages (source: skywork.ai, 'Best Prompts for DeepSeek-V3.2-Exp')
  • Do not rely on DeepSeek for safety-critical applications without additional guardrails — lower alignment scores compared to peers (source: Microsoft Research, safety benchmarking)

Sources

Compare prompting style with another model

Skip the manual application.

Refrase reads everything above and applies it for you. Try it on one of your own prompts.