Skip to main content
← All models

GPT-4o mini

OpenAI · openai family · Official docs

GPT-4o mini is OpenAI's cost-optimized model, ideal for high-volume classification, extraction, and summarization tasks where GPT-4o's full capabilities are unnecessary. At $0.15/1M input tokens it is the cheapest capable model in OpenAI's lineup. Key differentiation from Claude Haiku: GPT-4o mini has a larger context window (128K vs Haiku's 200K) but lower reasoning quality. From Llama: GPT-4o mini offers structured output guarantees that open-source Llama models cannot match without additional tooling. For Refrase, the GPT-4o-mini adapter should use the same markdown-structured prompts as GPT-4o but with more explicit instructions and heavier use of few-shot examples. The structured output strict mode is critical for this model to prevent format drift. Its successor GPT-4.1-mini is significantly better on all dimensions but 2.7x more expensive on input -- the tradeoff is context window (8x larger) and output quality. Note: GPT-4o mini does NOT support reasoning/thinking mode -- for tasks requiring internal reasoning, use o4-mini instead.

Try Refrase on a GPT-4o mini prompt

Paste any prompt — Refrase rewrites it using GPT-4o mini's documentation as context. 4–7 seconds end-to-end.

Specifications

128K
Context window
16K
Max output
$0.15 / $0.6
Per 1M tokens (in/out)

Strengths

extractionanalysisgenerationcode

Key capabilities

  • Cost-efficient small model at $0.15/1M input tokens -- more than 60% cheaper than GPT-3.5 Turbo while exceeding its quality (source: OpenAI GPT-4o mini Announcement, Pricing)
  • Supports text and vision inputs with text outputs; multimodal reasoning on images (source: OpenAI GPT-4o mini Announcement, Capabilities)
  • Structured Outputs with strict JSON schema enforcement, same as GPT-4o (source: OpenAI Structured Outputs Guide, Supported Models -- gpt-4o-mini-2024-07-18)
  • 128K context window matching GPT-4o for long document processing (source: OpenAI Models Page, GPT-4o mini)
  • MMLU score of 82%, ranking higher than GPT-4 on chat preference evaluations (source: OpenAI GPT-4o mini Announcement; llm-stats.com, Benchmark Scores)
  • Function/tool calling support with the same strict:true schema enforcement (source: OpenAI Structured Outputs Guide, Function Calling)
  • Logprobs and top_logprobs support for confidence scoring and token analysis (source: OpenRouter GPT-4o-mini Page, Features)

Known limitations

  • Knowledge cutoff of October 2023 -- 8 months behind GPT-4o's June 2024 cutoff (source: llm-stats.com GPT-4o mini page, Knowledge Cutoff; OpenAI Models Page)
  • SWE-bench Verified score of only 8.7%, drastically lower than GPT-4o's 33.2% -- not suitable for complex autonomous coding tasks (source: llm-stats.com GPT-4o mini, Benchmark Scores)
  • Superseded by GPT-4.1 mini which is 2.7x more expensive on input ($0.40 vs $0.15) but offers 1M context, 32K output, and significantly better benchmarks across the board (source: OpenAI GPT-4.1 Announcement; llm-stats.com GPT-4.1 mini)
  • Does not support reasoning mode -- no internal chain-of-thought like o1/o3/o4-mini models (source: OpenRouter GPT-4o-mini Page, Features)
  • Lower quality on complex reasoning and math tasks compared to full GPT-4o: MATH score 70.2% vs GPT-4o's higher performance (source: llm-stats.com GPT-4o mini, Benchmark Scores)

How to prompt GPT-4o mini

Preferred instruction format

Same role-based chat completion format as GPT-4o with 'system' role messages. System messages embedded as first message in the messages array. All GPT-4.1 prompting guide practices apply to GPT-4o mini as well, though the model is less capable at complex instruction following. (source: OpenAI GPT-4.1 Prompting Guide; OpenAI Models Page)

Recommended practices

  • Use structured system prompts with clear sections: Role, Instructions, Output Format, Examples -- same patterns as GPT-4o (source: OpenAI GPT-4.1 Prompting Guide, System Message Structure)
  • Leverage few-shot examples more heavily than with GPT-4o, as the smaller model benefits more from demonstrated patterns (source: OpenAI Help Center, Prompt Engineering Best Practices, Few-Shot Learning)
  • Use markdown headers and delimiters to separate prompt sections clearly -- helps the smaller model parse structure (source: OpenAI GPT-4.1 Prompting Guide, Delimiter Conventions)
  • Keep prompts more explicit and less ambiguous than you would for GPT-4o -- the mini model infers intent less reliably (source: OpenAI GPT-4.1 Prompting Guide, Instruction Hierarchy -- applies proportionally to smaller models)
  • Use structured outputs (strict JSON schema) to guarantee output format compliance -- especially important for smaller models prone to format drift (source: OpenAI Structured Outputs Guide, Introduction)
  • Optimize for caching by placing static system instructions and examples before variable user content (source: OpenAI Prompt Engineering Guide, Caching Strategy)

Anti-patterns to avoid

  • Do NOT rely on GPT-4o mini for complex multi-step reasoning without explicit step-by-step decomposition in the prompt (source: OpenAI Help Center, Prompt Engineering Best Practices)
  • Do NOT use for autonomous agentic workflows requiring complex tool orchestration -- SWE-bench score of 8.7% indicates poor agentic capability (source: llm-stats.com GPT-4o mini Benchmarks)
  • Same anti-patterns as GPT-4o apply: avoid JSON context wrapping, avoid manual tool schema injection, avoid sample phrase repetition without variation instruction (source: OpenAI GPT-4.1 Prompting Guide, Common Anti-Patterns)

Sources

Skip the manual application.

Refrase reads everything above and applies it for you. Try it on one of your own prompts.