Search Agent Sky
← Back to search Recent answers
Query
review qwen/qwen3.7-plus
Next step

Keep going with a follow-up, or copy the result if you want to share it.

Ask a follow-up
# Qwen/Qwen3.7-Plus Review (June 2026) ## Overview **Qwen3.7-Plus** is Alibaba Cloud Qwen Team's flagship multimodal agent model, released GA in **June 2026** (Preview launched May 20, 2026). It builds on the Qwen3.7 text backbone and adds vision, screen understanding, and GUI action capabilities — completing the Qwen3.7 lineup alongside the text-only **Qwen3.7-Max**. It's described as a **"multimodal interactive hybrid agent"** — meaning it handles both GUI environments (browser, desktop apps via screenshots) and CLI environments (terminal, code execution) in unified observe-act-observe loops. --- ## Key Specifications | Spec | Value | |---|---| | **Release Date** | June 3, 2026 (GA) | | **Context Window** | **1,000,000 tokens (1M)** | | **Max Output** | 65,536 tokens | | **Input Modalities** | Text, Image, Video | | **Output Modality** | Text (no image/video generation) | | **Reasoning** | Yes | | **License** | Proprietary | --- ## Pricing | Token Type | Cost (per 1M tokens) | |---|---| | **Input** | **$0.32 – $0.40** | | **Output** | **$1.16 – $1.28** | | **Cached Input** | $0.08 (80% discount) | Available via Together (primary API provider) and Alibaba Cloud. --- ## Performance & Benchmarks ### Intelligence (Artificial Analysis — #8 out of 162 in class) - **Artificial Analysis Intelligence Index: 39** (well above average of 16) - **GPQA Diamond: 90%** — graduate-level scientific reasoning - **Humanity's Last Exam (HLE): 33.4%** - **SciCode: 45.5%** — scientific computing - **τ²-Bench: 93%** — conversational agent benchmark - **IFBench: 78%** — instruction following - **LCR: 65%** — long-context reasoning ### GUI / Agentic Benchmarks - **ScreenSpot Pro: 79.0** — GUI grounding (pixel-level click positioning on screenshots). This is the headline number, placing it at the **front of the open-API GUI agent field**, competitive with Claude Computer Use and OpenAI Operator. - **Terminal-Bench: 70.3** — agentic terminal task performance - **TerminalBench Hard: 47%** - **Coding Index: 55.9** (#27 of 139 models) - **Agentic Index: 70** (#15 of 293 models) ### Speed & Efficiency - **Throughput: ~50 tokens/second** (notably slow — ranked #105/162) - **Latency P95: ~69.5 seconds** on Together - **Very verbose** — generates ~110M tokens in evaluation (vs. 58M avg) --- ## What Makes Qwen3.7-Plus Stand Out 1. **Hybrid GUI + CLI Agent**: Unlike many multimodal models that only understand images, Qwen3.7-Plus can operate a browser through screenshots AND run shell commands — with orchestration baked into the model itself. 2. **ScreenSpot Pro 79.0**: This is the key differentiator. It enables pixel-accurate GUI grounding for automating clicks, typing, and navigation from screenshots — directly competitive with Anthropic's Claude Computer Use and OpenAI Operator. 3. **1M-Token Context Window**: Tops 91% of models on the market, enabling long-document and multi-turn agentic sessions. 4. **Tool Use & Function Calling**: Full support for MCP-style tool orchestration and function calling workflows. --- ## Strengths - ✅ **Excellent agentic performance** — #15 globally for agentic tasks - ✅ **Top-tier GUI grounding** — ScreenSpot Pro 79.0 leads open-API field - ✅ **Very competitive pricing** for its capability tier ($0.32/M input) - ✅ **Huge 1M context window** - ✅ **Strong coding ability** (top quartile) - ✅ **Versatile input modalities** (text, image, video) - ✅ **Hybrid GUI + CLI agent baked in** ## Weaknesses - ❌ **Notably slow** (~50 tok/s) — among the slowest in its price class - ❌ **Very verbose** — generates significantly more tokens than average, which can increase latency and cost - ❌ **Proprietary license** — no open weights for commercial use (restricted) - ❌ **No image generation** — strictly text output only - ❌ **Expensive relative to peers on output tokens** — $1.16–$1.28/M output --- ## How It Compares to Qwen3.7-Max | Aspect | Qwen3.7-Max | Qwen3.7-Plus | |---|---|---| | Modality | Text only | Multimodal (text + vision) | | Best for | Long-horizon coding agents, reasoning | GUI agents, browser automation, UI tasks | | Context | 1M tokens | 1M tokens | | Pricing | $2.50/$7.50 per M (in/out) | $0.32/$1.28 per M (in/out) | | Intelligence Rank | Higher (flagship text) | Lower but complementary | The two are **complementary, not competing** — Max handles reasoning & code, Plus handles screen perception & GUI actions. --- ## Bottom Line **Qwen3.7-Plus is best for teams building:** - GUI/browser automation agents - Computer-use tools (click, type, navigate from screenshots) - UI testing and screen-scraping pipelines - Multimodal document processing with visual context - Productivity automation workflows **It's NOT ideal for:** - Pure text/code-only tasks (use Qwen3.7-Max instead) - Latency-sensitive applications (it's slow) - Scenarios requiring open-weight models or on-prem deployment **Verdict:** At #8 in intelligence in its class and leading the open-API GUI agent category on ScreenSpot Pro, Qwen3.7-Plus is a strong multimodal contender. Its hybrid GUI+CLI approach, 1M context window, and aggressive pricing make it a compelling option for agentic workflows — especially for teams that route between Max (reasoning) and Plus (vision/actions) in a stacked agent architecture.

Sources visited

1searchQwen Qwen3.7-Plus review
2navigatehttps://www.buildfastwithai.com/blogs/qwen-3-7-plus-multimodal-agent-review-2026
3navigatehttps://designforonline.com/ai-models/qwen-qwen3-7-plus/
4navigatehttps://llm-stats.com/models/qwen3.7-plus
5navigatehttps://artificialanalysis.ai/models/qwen3-7-plus
Shared by Proto Státis · Jul 3, 2026

Ask a follow-up

Refine the angle, compare sources, or ask for a deeper cut on this answer.