Skip to main content
← All models

Kimi K2

Moonshot · kimi family · Official docs

Kimi K2 is a cost-effective frontier model with exceptional agentic capabilities. The 1T total / 32B active MoE architecture delivers strong performance at low inference cost. Key differentiator: optimized specifically for tool calling and agentic workflows, not just chat. The 0.6 temperature requirement is strict — deviating significantly degrades output quality. Watch for verbosity (2-2.5x token usage vs peers) which can inflate costs despite low per-token pricing. The vLLM tool calling compatibility issues have been documented and fixed but indicate the model's tool call format requires careful parser configuration.

Try Refrase on a Kimi K2 prompt

Paste any prompt — Refrase rewrites it using Kimi K2's documentation as context. 4–7 seconds end-to-end.

Specifications

128K
Context window
8K
Max output
$0.55 / $2.2
Per 1M tokens (in/out)
Cache hit pricing reduces input to ~$0.15/M (75% discount). K2 0905 'exacto' variant: $0.39/$1.90. Thinking variant: $0.47/$2.00. (source: pricepertoken.com, OpenRouter)

Strengths

extractionanalysis

Key capabilities

  • 1 trillion total parameters with only 32B activated per token via 384-expert MoE (source: Hugging Face, Model Card)
  • State-of-the-art agentic intelligence with native tool calling and autonomous problem-solving (source: Hugging Face, Model Card)
  • Strong coding: LiveCodeBench v6 53.7% SOTA, SWE-bench Verified 65.8% single attempt (source: Hugging Face, Model Card)
  • Mathematical reasoning: MATH-500 97.4% SOTA, AIME 2024 69.6% SOTA (source: Hugging Face, Model Card)
  • General knowledge: MMLU 89.5%, MMLU-Redux 92.7% SOTA (source: Hugging Face, Model Card)
  • Instruction following: IFEval 89.8% Prompt Strict SOTA (source: Hugging Face, Model Card)
  • OpenAI and Anthropic API compatible — drop-in replacement (source: Hugging Face, Model Card)
  • Modified MIT License allowing commercial use (source: Hugging Face, Model Card)

Known limitations

  • Max output tokens limited to 8K in standard mode, 16K for SWE-bench agentless (source: Hugging Face, Model Card)
  • No vision/multimodal support in K2 base — requires K2.5 for vision (source: Community documentation)
  • Extreme verbosity: 2-2.5x token usage compared to other models, impacting cost and latency (source: Skywork.ai analysis)
  • Initial vLLM tool calling only 18% success rate without custom parser fixes (source: vLLM Blog, debugging report)
  • Thinking mode adds 15-35% latency and 1.2-1.6x token overhead (source: Skywork.ai, Kimi K2 Thinking Limits)
  • Can overthink easy tasks in thinking mode, drift on long rule-heavy prompts (source: Skywork.ai analysis)
  • 2-5% hallucination rate on highly specific uncited facts even in thinking mode (source: Skywork.ai analysis)
  • Reflex-grade model without long thinking — not designed for deep extended reasoning (source: Hugging Face, Model Card)

How to prompt Kimi K2

Preferred instruction format

Standard OpenAI-compatible chat format with system/user/assistant roles. Default system prompt: 'You are Kimi, an AI assistant created by Moonshot AI.'

Recommended practices

  • Set temperature to 0.6 for Instruct mode (source: Hugging Face, Model Card)
  • Use tool_choice='auto' for autonomous tool selection (source: GitHub, Kimi-K2 README)
  • OpenAI-compatible function calling format with tools parameter (source: Hugging Face, Model Card)
  • For Anthropic API compatibility, apply temperature mapping: real_temperature = request_temperature * 0.6 (source: Hugging Face, Model Card)
  • Provide task-specific system prompts rather than relying on defaults when special instructions are needed (source: GitHub, Kimi-K2 README)

Anti-patterns to avoid

  • Do NOT set temperature above 0.6 for Instruct mode — model was optimized for this setting (source: Hugging Face, Model Card)
  • Do NOT use long rule-heavy prompts — model may drift from instructions (source: Skywork.ai analysis)
  • Do NOT assume live data retrieval without explicitly enabling browsing/tools — model produces confident but stale answers otherwise (source: Skywork.ai analysis)
  • Do NOT expect vision capabilities from K2 base — use K2.5 for multimodal tasks (source: Community documentation)

Sources

Skip the manual application.

Refrase reads everything above and applies it for you. Try it on one of your own prompts.