Skip to main content
Refrase
  • Pricing
Star
← All models

Qwen3 235B

Alibaba · qwen family · Official docs

Qwen3-235B-A22B is the flagship MoE model from Alibaba's Qwen team. Its hybrid thinking mode is a first-class feature — unlike models where reasoning is bolted on, Qwen3 was trained from the ground up to switch between deep reasoning and fast responses. The /think and /no_think soft switches make it uniquely controllable at the prompt level without API parameter changes. At 22B activated parameters per token, it delivers frontier-class reasoning at a fraction of the compute cost of dense 200B+ models. The 119-language support makes it the strongest multilingual open-weight model available. Key trade-off: the MoE architecture requires significant VRAM for self-hosting despite low per-token compute.

Try Refrase on a Qwen3 235B prompt

Paste any prompt — Refrase rewrites it using Qwen3 235B's documentation as context. 4–7 seconds end-to-end.

Open in /enhanceTry Guided mode

Specifications

131K
Context window
33K
Max output
$0.7 / $2.8
Per 1M tokens (in/out)
Alibaba Cloud DashScope international pricing. Thinking mode output: $8.40/1M tokens. Global (US Virginia) pricing lower: $0.287 input, $1.147 output (non-thinking), $2.868 (thinking). Open-weight Apache 2.0 — self-hosting eliminates API costs. (source: Alibaba Cloud Model Studio, Model Pricing page)

Strengths

analysisgenerationcode

Key capabilities

  • ✓Mixture-of-Experts architecture: 235B total params, 22B activated per token, 128 experts with 8 activated per token (source: Hugging Face, Qwen3-235B-A22B Model Card)
  • ✓Hybrid thinking mode: seamless switching between thinking mode (step-by-step reasoning in <think> blocks) and non-thinking mode (fast direct responses) via enable_thinking parameter or /think and /no_think soft switches (source: Qwen Blog, 'Qwen3: Think Deeper, Act Faster')
  • ✓119 languages and dialects supported across Indo-European, Sino-Tibetan, Afro-Asiatic, and other language families (source: Qwen Blog, 'Qwen3: Think Deeper, Act Faster')
  • ✓Trained on approximately 36 trillion tokens — nearly double Qwen2.5's 18 trillion — including synthetic math and code data (source: Qwen Blog, 'Qwen3: Think Deeper, Act Faster')
  • ✓Extended context via YaRN rope scaling from 32K native to 131K tokens (source: Hugging Face, Qwen3-235B-A22B Model Card)
  • ✓Strong agentic task performance: leading results on complex agent-based benchmarks among open-source models (source: Hugging Face, Qwen3-32B Model Card)
  • ✓Open-weight under Apache 2.0 license enabling full commercial and research use (source: Qwen GitHub Repository)

Known limitations

  • ⚠Greedy decoding causes performance degradation and endless repetitions — must use sampling with recommended temperature settings (source: Hugging Face, Qwen3-235B-A22B Model Card)
  • ⚠YaRN static scaling applies a constant factor regardless of input length, which may negatively impact performance on shorter texts when enabled (source: Hugging Face, Qwen3-235B-A22B Model Card)
  • ⚠Higher presence_penalty values (above ~1.5) may cause language mixing in multilingual contexts (source: Hugging Face, Qwen3-235B-A22B Model Card)
  • ⚠Quantization below 4-bit causes significant performance degradation, especially in complex reasoning tasks, more pronounced than in previous Qwen generations (source: arXiv:2505.02214, 'An Empirical Study of Qwen3 Quantization')
  • ⚠Format-dependent reasoning: strong on pattern-matching benchmarks but weaker on strict logical forms like syllogisms (source: LogiEval benchmark analysis, emergentmind.com)

How to prompt Qwen3 235B

Preferred instruction format

Standard chat format with system/user/assistant roles. System message sets context; user message contains the task. Thinking mode controlled via enable_thinking=True/False in chat_template_kwargs, or via /think and /no_think soft switches in user messages.

Recommended practices

  • Use Temperature=0.6, TopP=0.95, TopK=20, MinP=0 for thinking mode; Temperature=0.7, TopP=0.8, TopK=20, MinP=0 for non-thinking mode (source: Hugging Face, Qwen3-235B-A22B Model Card)
  • Set max output to 32,768 tokens for most queries; use 38,912 for highly complex competition-level problems (source: Hugging Face, Qwen3-235B-A22B Model Card)
  • In multi-turn conversations, include only the final output in history — strip <think> blocks from previous turns (source: Hugging Face, Qwen3-235B-A22B Model Card)
  • Use thinking_budget parameter to cap reasoning token usage when latency is a concern (source: Alibaba Cloud Documentation, 'How to use deep thinking models')
  • Enable YaRN rope scaling only when input exceeds 32,768 tokens to avoid performance impact on shorter contexts (source: Hugging Face, Qwen3-235B-A22B Model Card)

Anti-patterns to avoid

  • Never use greedy decoding (temperature=0) — causes endless repetitions and severe quality degradation (source: Hugging Face, Qwen3-235B-A22B Model Card)
  • Do not include thinking content (<think> blocks) in multi-turn conversation history — only include final output (source: Hugging Face, Qwen3-235B-A22B Model Card)
  • Avoid presence_penalty values above 1.5 in multilingual scenarios — triggers language mixing (source: Hugging Face, Qwen3-235B-A22B Model Card)

Sources

  • https://huggingface.co/Qwen/Qwen3-235B-A22B
  • https://qwenlm.github.io/blog/qwen3/
  • https://www.alibabacloud.com/help/en/model-studio/deep-thinking
  • https://arxiv.org/abs/2505.09388

Compare prompting style with another model

vs Claude Sonnet 4.6vs GPT-5.5vs DeepSeek V3

Skip the manual application.

Refrase reads everything above and applies it for you. Try it on one of your own prompts.

Open /enhance with Qwen3 235B
Refrase

Your prompts, upgraded.

Product

  • Enhance
  • Extension
  • API
  • MCP

Research

  • Papers
  • Methodology
  • Benchmarks
  • Models

Company

  • Blog
  • Changelog
  • Pricing
  • Docs
  • GitHub
Privacy Policy·Terms of Service·All Systems Operational

© 2026 Refrase. All rights reserved.