Gemini 2.5 Pro
Google · gemini family · Official docs
Gemini 2.5 Pro is arguably the most powerful model with the most broken Refrase adapter. The current identity function means ZERO adaptations are applied, wasting Gemini's native structured output, dedicated system instruction field, and thinking budget control. The highest-impact fix is using response_mime_type + response_schema for JSON output instead of prompt-based JSON instructions — this alone could eliminate all parsing failures. The second priority is separating system instructions into the dedicated API parameter. Gemini's unique strengths (Google Search grounding, 1M context with caching, native multimodal) are all features that Claude and GPT lack natively, making Gemini adaptation particularly high-value. The structured output + function calling incompatibility on 2.5 is a real gotcha that the adapter must work around.
Specifications
Strengths
Key capabilities
- ✓1M token context window supporting text, code, images, audio, and video input — can process ~50,000 lines of code or 8 average-length novels in a single request (source: Google AI docs, Long Context guide, https://ai.google.dev/gemini-api/docs/long-context)
- ✓Built-in thinking/reasoning with configurable thinking budget from 128 to 32,768 tokens (default: dynamic). Thinking is enabled by default and cannot be fully disabled. (source: Google AI docs, Thinking guide, https://ai.google.dev/gemini-api/docs/thinking)
- ✓Native structured output via response_mime_type='application/json' with response_schema — guarantees syntactically valid JSON conforming to provided JSON Schema, supporting string/number/integer/boolean/object/array/null types plus anyOf, $ref, enum, format constraints (source: Google AI docs, Structured Output, https://ai.google.dev/gemini-api/docs/structured-output)
- ✓Grounding with Google Search — model autonomously generates search queries, retrieves web sources, and returns groundingChunks with URIs and groundingSupports linking claims to sources (source: Google AI docs, Google Search grounding, https://ai.google.dev/gemini-api/docs/google-search)
- ✓OpenAI API compatibility endpoint at generativelanguage.googleapis.com/v1beta/openai/ supporting chat completions, embeddings, function calling, and structured output via Pydantic/Zod schemas (source: Google AI docs, OpenAI Compatibility, https://ai.google.dev/gemini-api/docs/openai)
- ✓Context caching for repeated long-context usage at ~4x cost reduction vs standard input pricing (source: Google AI docs, Long Context guide, https://ai.google.dev/gemini-api/docs/long-context)
- ✓Multimodal input: up to 3,000 images per prompt, ~45 min video with audio, ~8.4 hours of audio, and document parsing up to 1,000 pages per file (source: Vertex AI docs, Gemini 2.5 Flash specifications — same limits apply to Pro, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash)
- ✓Many-shot in-context learning with hundreds to thousands of examples, achieving performance comparable to fine-tuned models (source: Google AI docs, Long Context guide, https://ai.google.dev/gemini-api/docs/long-context)
Known limitations
- ⚠Structured output cannot be combined with function calling / tool use on Gemini 2.5 models — when tool calls are present in message history, structured outputs fail. Works on 2.0 models but breaks on 2.5. (source: GitHub issue googleapis/python-genai#867, https://github.com/googleapis/python-genai/issues/867)
- ⚠Complex JSON schemas may trigger InvalidArgument: 400 errors. Complexity from long property names, large array limits, many-valued enums, many optional properties, or deep nesting can all cause failures. Must shorten names, flatten arrays, reduce constraints. (source: Vertex AI docs, Structured Output, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output)
- ⚠Long context retrieval accuracy degrades for multiple simultaneous queries — model excels at single needle-in-haystack retrieval (~99% accuracy) but struggles with multi-needle retrieval, requiring multiple requests. (source: Google AI docs, Long Context guide, https://ai.google.dev/gemini-api/docs/long-context)
- ⚠Thinking tokens consume output token budget and are billed — thinking budget can overflow or underflow the specified allocation depending on prompt complexity, and thinking cannot be disabled on 2.5 Pro. (source: Google AI docs, Thinking guide, https://ai.google.dev/gemini-api/docs/thinking)
- ⚠System instructions do not fully prevent jailbreaks or leaks — Google explicitly warns against placing sensitive information in system instructions. (source: Vertex AI docs, System Instructions, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/system-instructions)
How to prompt Gemini 2.5 Pro
Preferred instruction format
Gemini uses a dedicated systemInstruction field in the API request, separate from user messages. System instructions are processed BEFORE user prompts and persist across the entire conversation. They can be a single string or an array of strings. In Python SDK: system_instruction=['Role description', 'Behavioral rules']. In REST: systemInstruction.parts[].text. This is NOT embedded in the user message — it is a first-class API parameter. (source: Vertex AI docs, System Instructions, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/system-instructions)
Recommended practices
- Use XML-style tags (<context>, <task>) or Markdown headings as consistent delimiters to structure prompt sections — choose one format and use it consistently within a single prompt (source: Google AI docs, Prompt Design Strategies, https://ai.google.dev/gemini-api/docs/prompting-strategies)
- Place critical instructions and constraints at the BEGINNING of the prompt, and for long contexts, put the specific question/task at the END after all context data (source: Vertex AI docs, Gemini 3 Prompting Guide, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/gemini-3-prompting-guide)
- Always include 3-5 few-shot examples with consistent formatting — Google explicitly recommends 'always include few-shot examples in your prompts' but warns too many cause overfitting (source: Google AI docs, Prompt Design Strategies, https://ai.google.dev/gemini-api/docs/prompting-strategies)
- For structured output, set response_mime_type='application/json' with response_schema — do NOT duplicate the schema description in the prompt text; use the schema's 'description' fields instead to guide the model (source: Vertex AI docs, Structured Output, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output)
- Use propertyOrdering in JSON schemas to enforce field generation order, and ensure any in-prompt schema references use the SAME property order as the schema definition (source: Vertex AI docs, Structured Output, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output)
- Use objective constraints instead of subjective qualifiers — e.g. 'write a summary of 3 sentences or less' rather than 'write a brief summary' (source: Google AI docs, Prompt Design Strategies, https://ai.google.dev/gemini-api/docs/prompting-strategies)
- For grounding to provided context only, explicitly state 'the provided context is the only source of truth for the current session' in system instructions (source: Vertex AI docs, Gemini 3 Prompting Guide, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/gemini-3-prompting-guide)
- Add current date context and knowledge cutoff statement for time-sensitive queries (source: Google AI docs, Prompt Design Strategies, https://ai.google.dev/gemini-api/docs/prompting-strategies)
- Keep temperature at default 1.0 — lowering it may cause 'looping or degraded performance, particularly with complex mathematical or reasoning tasks' (source: Vertex AI docs, Gemini 3 Prompting Guide, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/gemini-3-prompting-guide)
Anti-patterns to avoid
- DO NOT lower temperature below 1.0 for reasoning tasks — Gemini's reasoning is optimized for temp=1.0 and lower values cause looping and degraded performance (source: Vertex AI docs, Gemini 3 Prompting Guide, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/gemini-3-prompting-guide)
- DO NOT duplicate JSON schema in prompt text when using response_schema — include the schema only in the response_schema parameter to avoid confusion; use schema description fields for guidance (source: Vertex AI docs, Structured Output, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output)
- DO NOT use too many few-shot examples — overfitting occurs; stick to 3-5 varied examples (source: Google AI docs, Prompt Design Strategies, https://ai.google.dev/gemini-api/docs/prompting-strategies)
- DO NOT place essential instructions at the end of very long prompts — model may drop them during complex processing (source: Vertex AI docs, Gemini 3 Prompting Guide, https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/gemini-3-prompting-guide)
- DO NOT use vague or subjective language ('brief', 'detailed', 'good') without measurable definitions (source: Google AI docs, Prompt Design Strategies, https://ai.google.dev/gemini-api/docs/prompting-strategies)
- DO NOT combine structured output (response_mime_type) with function calling on Gemini 2.5 — they are incompatible and will fail (source: GitHub issue googleapis/python-genai#867)
- DO NOT use inconsistent formatting across few-shot examples — responses will mirror the inconsistency (source: Google AI docs, Prompt Design Strategies, https://ai.google.dev/gemini-api/docs/prompting-strategies)
Sources
- https://ai.google.dev/gemini-api/docs/prompting-strategies
- https://ai.google.dev/gemini-api/docs/structured-output
- https://ai.google.dev/gemini-api/docs/thinking
- https://ai.google.dev/gemini-api/docs/long-context
- https://ai.google.dev/gemini-api/docs/google-search
- https://ai.google.dev/gemini-api/docs/openai
- https://docs.cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/system-instructions
- https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/gemini-3-prompting-guide
- https://docs.cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output