Skip to main content
← All models

MiniMax M2

MiniMax · minimax family · Official docs

MiniMax M2 is a Chinese-origin model optimized for coding and agentic workflows. The 230B/10B MoE architecture is one of the most aggressively sparse designs — only 4.3% of parameters active per token. The interleaved thinking pattern with <think> tags is a key differentiator — unlike models with separate thinking modes, M2 weaves reasoning into generation naturally. The XML-based tool calling format (<minimax:tool_call>) is unique and requires specific parser support. The model's MIT license and competitive benchmarks make it attractive for open-source deployments. Successors M2.1 (enhanced multilingual coding) and M2.5 (agent swarm) build on this base. The 200K context / 128K output combination matches GLM-4.7 Flash and exceeds most competitors.

Try Refrase on a MiniMax M2 prompt

Paste any prompt — Refrase rewrites it using MiniMax M2's documentation as context. 4–7 seconds end-to-end.

Specifications

200K
Context window
128K
Max output
$0.26 / $1
Per 1M tokens (in/out)
Currently free for limited time on MiniMax platform. Third-party pricing: $0.26/$1.00. Successor M2.1 and M2.5 available with similar pricing. (source: pricepertoken.com, llm-stats.com)

Strengths

extractionanalysis

Key capabilities

  • 230B total parameters with only 10B active — extremely compact MoE (source: GitHub, README)
  • Interleaved thinking with <think>...</think> tags for chain-of-thought reasoning (source: GitHub, README)
  • Strong coding: SWE-bench Verified 69.4%, LiveCodeBench 83%, Terminal-Bench 46.3% (source: GitHub, README)
  • General intelligence: MMLU-Pro 82%, BrowseComp 44% (source: GitHub, README)
  • Native tool calling with XML format <minimax:tool_call> tags (source: Hugging Face, Tool Calling Guide)
  • Plans and executes complex long-horizon toolchains across shell, browser, retrieval, and code runners (source: GitHub, README)
  • Multi-file editing and code-run-fix loops (source: GitHub, README)
  • 200K context window with 128K output capacity (source: MiniMax API Docs)
  • MIT License — fully open source (source: GitHub, README)

Known limitations

  • Interleaved thinking content in <think> tags must be preserved in conversation history — removing them degrades performance (source: GitHub, README)
  • 10B active parameters may limit depth of reasoning on highly complex tasks vs larger dense models (source: Architecture analysis)
  • Training details not publicly disclosed (source: GitHub, README — absent from documentation)
  • Newer M2.1 and M2.5 versions available — M2 may receive fewer updates (source: MiniMax, Product Timeline)
  • XML-based tool calling format (<minimax:tool_call>) requires custom parsing if not using vLLM/SGLang built-in parsers (source: Hugging Face, Tool Calling Guide)
  • Released October 2025 — younger model with less community ecosystem than established alternatives (source: llm-stats.com)

How to prompt MiniMax M2

Preferred instruction format

Standard OpenAI-compatible chat format. Default system prompt: 'You are a helpful assistant. Your name is MiniMax-M2 and is built by MiniMax.' Tool definitions injected as structured text with XML format instructions.

Recommended practices

  • Use temperature=1.0, top_p=0.95, top_k=40 for best performance (source: GitHub, README)
  • Preserve <think>...</think> tags in conversation history — removing them degrades multi-turn performance (source: GitHub, README)
  • Give the model a role, constraints, and acceptance tests in system prompt — structure beats cleverness (source: Skywork.ai, Prompt Optimization)
  • Let the model think in steps and invite self-critique for complex tasks (source: Skywork.ai, Prompt Optimization)
  • Use vLLM or SGLang with built-in parsers for automatic tool call handling (source: Hugging Face, Tool Calling Guide)
  • For manual tool call parsing, use XML regex to extract <minimax:tool_call> blocks (source: Hugging Face, Tool Calling Guide)
  • Return tool results with role='tool' and structured content array format (source: Hugging Face, Tool Calling Guide)

Anti-patterns to avoid

  • Do NOT strip <think>...</think> tags from conversation history — model relies on them for coherent multi-turn reasoning (source: GitHub, README)
  • Do NOT parse tool calls without schema type information — parameters need type-aware conversion (source: Hugging Face, Tool Calling Guide)
  • Do NOT skip adding tool results back to conversation history — breaks iterative tool calling (source: Hugging Face, Tool Calling Guide)
  • Do NOT assume string encoding for all tool call parameters — use schema type definitions for proper conversion (source: Hugging Face, Tool Calling Guide)

Sources

Skip the manual application.

Refrase reads everything above and applies it for you. Try it on one of your own prompts.