GPT-4o mini
OpenAI · openai family · Official Docs
GPT-4o mini is OpenAI's cost-optimized model, ideal for high-volume classification, extraction, and summarization tasks where GPT-4o's full capabilities are unnecessary. At $0.15/1M input tokens it is the cheapest capable model in OpenAI's lineup. Key differentiation from Claude Haiku: GPT-4o mini has a larger context window (128K vs Haiku's 200K) but lower reasoning quality. From Llama: GPT-4o mini offers structured output guarantees that open-source Llama models cannot match without additional tooling. For Refrase, the GPT-4o-mini adapter should use the same markdown-structured prompts as GPT-4o but with more explicit instructions and heavier use of few-shot examples. The structured output strict mode is critical for this model to prevent format drift. Its successor GPT-4.1-mini is significantly better on all dimensions but 2.7x more expensive on input -- the tradeoff is context window (8x larger) and output quality. Note: GPT-4o mini does NOT support reasoning/thinking mode -- for tasks requiring internal reasoning, use o4-mini instead.
Specifications
Key Capabilities
- ✓Cost-efficient small model at $0.15/1M input tokens -- more than 60% cheaper than GPT-3.5 Turbo while exceeding its quality (source: OpenAI GPT-4o mini Announcement, Pricing)
- ✓Supports text and vision inputs with text outputs; multimodal reasoning on images (source: OpenAI GPT-4o mini Announcement, Capabilities)
- ✓Structured Outputs with strict JSON schema enforcement, same as GPT-4o (source: OpenAI Structured Outputs Guide, Supported Models -- gpt-4o-mini-2024-07-18)
- ✓128K context window matching GPT-4o for long document processing (source: OpenAI Models Page, GPT-4o mini)
- ✓MMLU score of 82%, ranking higher than GPT-4 on chat preference evaluations (source: OpenAI GPT-4o mini Announcement; llm-stats.com, Benchmark Scores)
- ✓Function/tool calling support with the same strict:true schema enforcement (source: OpenAI Structured Outputs Guide, Function Calling)
- ✓Logprobs and top_logprobs support for confidence scoring and token analysis (source: OpenRouter GPT-4o-mini Page, Features)
Known Limitations
- ⚠Knowledge cutoff of October 2023 -- 8 months behind GPT-4o's June 2024 cutoff (source: llm-stats.com GPT-4o mini page, Knowledge Cutoff; OpenAI Models Page)
- ⚠SWE-bench Verified score of only 8.7%, drastically lower than GPT-4o's 33.2% -- not suitable for complex autonomous coding tasks (source: llm-stats.com GPT-4o mini, Benchmark Scores)
- ⚠Superseded by GPT-4.1 mini which is 2.7x more expensive on input ($0.40 vs $0.15) but offers 1M context, 32K output, and significantly better benchmarks across the board (source: OpenAI GPT-4.1 Announcement; llm-stats.com GPT-4.1 mini)
- ⚠Does not support reasoning mode -- no internal chain-of-thought like o1/o3/o4-mini models (source: OpenRouter GPT-4o-mini Page, Features)
- ⚠Lower quality on complex reasoning and math tasks compared to full GPT-4o: MATH score 70.2% vs GPT-4o's higher performance (source: llm-stats.com GPT-4o mini, Benchmark Scores)
Prompt Patterns
Preferred Instruction Format
Same role-based chat completion format as GPT-4o with 'system' role messages. System messages embedded as first message in the messages array. All GPT-4.1 prompting guide practices apply to GPT-4o mini as well, though the model is less capable at complex instruction following. (source: OpenAI GPT-4.1 Prompting Guide; OpenAI Models Page)
Recommended Practices
- Use structured system prompts with clear sections: Role, Instructions, Output Format, Examples -- same patterns as GPT-4o (source: OpenAI GPT-4.1 Prompting Guide, System Message Structure)
- Leverage few-shot examples more heavily than with GPT-4o, as the smaller model benefits more from demonstrated patterns (source: OpenAI Help Center, Prompt Engineering Best Practices, Few-Shot Learning)
- Use markdown headers and delimiters to separate prompt sections clearly -- helps the smaller model parse structure (source: OpenAI GPT-4.1 Prompting Guide, Delimiter Conventions)
- Keep prompts more explicit and less ambiguous than you would for GPT-4o -- the mini model infers intent less reliably (source: OpenAI GPT-4.1 Prompting Guide, Instruction Hierarchy -- applies proportionally to smaller models)
- Use structured outputs (strict JSON schema) to guarantee output format compliance -- especially important for smaller models prone to format drift (source: OpenAI Structured Outputs Guide, Introduction)
- Optimize for caching by placing static system instructions and examples before variable user content (source: OpenAI Prompt Engineering Guide, Caching Strategy)
Anti-Patterns to Avoid
- Do NOT rely on GPT-4o mini for complex multi-step reasoning without explicit step-by-step decomposition in the prompt (source: OpenAI Help Center, Prompt Engineering Best Practices)
- Do NOT use for autonomous agentic workflows requiring complex tool orchestration -- SWE-bench score of 8.7% indicates poor agentic capability (source: llm-stats.com GPT-4o mini Benchmarks)
- Same anti-patterns as GPT-4o apply: avoid JSON context wrapping, avoid manual tool schema injection, avoid sample phrase repetition without variation instruction (source: OpenAI GPT-4.1 Prompting Guide, Common Anti-Patterns)
Sources
- https://developers.openai.com/cookbook/examples/gpt4-1_prompting_guide/
- https://platform.openai.com/docs/guides/prompt-engineering
- https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api
- https://platform.openai.com/docs/guides/structured-outputs
- https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
What Refrase Does
Here is exactly how Refrase optimizes prompts for GPT-4o mini, rule by rule:
Before / After
See how Refrase transforms a generic prompt for GPT-4o mini.
Try It
Click "Refrase It" or select a model to see the optimized prompt.