MiniMax M2

MiniMax · minimax family · Official docs

MiniMax M2 is a Chinese-origin model optimized for coding and agentic workflows. The 230B/10B MoE architecture is one of the most aggressively sparse designs — only 4.3% of parameters active per token. The interleaved thinking pattern with <think> tags is a key differentiator — unlike models with separate thinking modes, M2 weaves reasoning into generation naturally. The XML-based tool calling format (<minimax:tool_call>) is unique and requires specific parser support. The model's MIT license and competitive benchmarks make it attractive for open-source deployments. Successors M2.1 (enhanced multilingual coding) and M2.5 (agent swarm) build on this base. The 200K context / 128K output combination matches GLM-4.7 Flash and exceeds most competitors.

Try Refrase on a MiniMax M2 prompt

Paste any prompt — Refrase rewrites it using MiniMax M2's documentation as context. 4–7 seconds end-to-end.

Open in /enhance Try Guided mode

Specifications

200K

Context window

128K

Max output

$0.26 / $1

Per 1M tokens (in/out)

Currently free for limited time on MiniMax platform. Third-party pricing: $0.26/$1.00. Successor M2.1 and M2.5 available with similar pricing. (source: pricepertoken.com, llm-stats.com)

Strengths

extractionanalysis

Key capabilities

✓230B total parameters with only 10B active — extremely compact MoE (source: GitHub, README)
✓Interleaved thinking with <think>...</think> tags for chain-of-thought reasoning (source: GitHub, README)
✓Strong coding: SWE-bench Verified 69.4%, LiveCodeBench 83%, Terminal-Bench 46.3% (source: GitHub, README)
✓General intelligence: MMLU-Pro 82%, BrowseComp 44% (source: GitHub, README)
✓Native tool calling with XML format <minimax:tool_call> tags (source: Hugging Face, Tool Calling Guide)
✓Plans and executes complex long-horizon toolchains across shell, browser, retrieval, and code runners (source: GitHub, README)
✓Multi-file editing and code-run-fix loops (source: GitHub, README)
✓200K context window with 128K output capacity (source: MiniMax API Docs)
✓MIT License — fully open source (source: GitHub, README)

Known limitations

⚠Interleaved thinking content in <think> tags must be preserved in conversation history — removing them degrades performance (source: GitHub, README)
⚠10B active parameters may limit depth of reasoning on highly complex tasks vs larger dense models (source: Architecture analysis)
⚠Training details not publicly disclosed (source: GitHub, README — absent from documentation)
⚠Newer M2.1 and M2.5 versions available — M2 may receive fewer updates (source: MiniMax, Product Timeline)
⚠XML-based tool calling format (<minimax:tool_call>) requires custom parsing if not using vLLM/SGLang built-in parsers (source: Hugging Face, Tool Calling Guide)
⚠Released October 2025 — younger model with less community ecosystem than established alternatives (source: llm-stats.com)

How to prompt MiniMax M2

Preferred instruction format

Standard OpenAI-compatible chat format. Default system prompt: 'You are a helpful assistant. Your name is MiniMax-M2 and is built by MiniMax.' Tool definitions injected as structured text with XML format instructions.

Recommended practices

Use temperature=1.0, top_p=0.95, top_k=40 for best performance (source: GitHub, README)
Preserve <think>...</think> tags in conversation history — removing them degrades multi-turn performance (source: GitHub, README)
Give the model a role, constraints, and acceptance tests in system prompt — structure beats cleverness (source: Skywork.ai, Prompt Optimization)
Let the model think in steps and invite self-critique for complex tasks (source: Skywork.ai, Prompt Optimization)
Use vLLM or SGLang with built-in parsers for automatic tool call handling (source: Hugging Face, Tool Calling Guide)
For manual tool call parsing, use XML regex to extract <minimax:tool_call> blocks (source: Hugging Face, Tool Calling Guide)
Return tool results with role='tool' and structured content array format (source: Hugging Face, Tool Calling Guide)

Anti-patterns to avoid

Do NOT strip <think>...</think> tags from conversation history — model relies on them for coherent multi-turn reasoning (source: GitHub, README)
Do NOT parse tool calls without schema type information — parameters need type-aware conversion (source: Hugging Face, Tool Calling Guide)
Do NOT skip adding tool results back to conversation history — breaks iterative tool calling (source: Hugging Face, Tool Calling Guide)
Do NOT assume string encoding for all tool call parameters — use schema type definitions for proper conversion (source: Hugging Face, Tool Calling Guide)

Sources

Skip the manual application.

Refrase reads everything above and applies it for you. Try it on one of your own prompts.

Open /enhance with MiniMax M2