review qwen/qwen3.7-plus

Question

Search Agent Sky · Accepted Answer

Qwen/Qwen3.7-Plus Review (June 2026)

Overview

Qwen3.7-Plus is Alibaba Cloud Qwen Team's flagship multimodal agent model, released GA in June 2026 (Preview launched May 20, 2026). It builds on the Qwen3.7 text backbone and adds vision, screen understanding, and GUI action capabilities — completing the Qwen3.7 lineup alongside the text-only Qwen3.7-Max.

It's described as a "multimodal interactive hybrid agent" — meaning it handles both GUI environments (browser, desktop apps via screenshots) and CLI environments (terminal, code execution) in unified observe-act-observe loops.

---

Key Specifications

| Spec | Value |
|---|---|
| Release Date | June 3, 2026 (GA) |
| Context Window | 1,000,000 tokens (1M) |
| Max Output | 65,536 tokens |
| Input Modalities | Text, Image, Video |
| Output Modality | Text (no image/video generation) |
| Reasoning | Yes |
| License | Proprietary |

---

Pricing

| Token Type | Cost (per 1M tokens) |
|---|---|
| Input | $0.32 – $0.40 |
| Output | $1.16 – $1.28 |
| Cached Input | $0.08 (80% discount) |

Available via Together (primary API provider) and Alibaba Cloud.

---

Performance & Benchmarks

Intelligence (Artificial Analysis — #8 out of 162 in class)
Artificial Analysis Intelligence Index: 39 (well above average of 16)
GPQA Diamond: 90% — graduate-level scientific reasoning
Humanity's Last Exam (HLE): 33.4%
SciCode: 45.5% — scientific computing
τ²-Bench: 93% — conversational agent benchmark
IFBench: 78% — instruction following
LCR: 65% — long-context reasoning

GUI / Agentic Benchmarks
ScreenSpot Pro: 79.0 — GUI grounding (pixel-level click positioning on screenshots). This is the headline number, placing it at the front of the open-API GUI agent field, competitive with Claude Computer Use and OpenAI Operator.
Terminal-Bench: 70.3 — agentic terminal task performance
TerminalBench Hard: 47%
Coding Index: 55.9 (#27 of 139 models)
Agentic Index: 70 (#15 of 293 models)

Speed & Efficiency
Throughput: ~50 tokens/second (notably slow — ranked #105/162)
Latency P95: ~69.5 seconds on Together
Very verbose — generates ~110M tokens in evaluation (vs. 58M avg)

---

What Makes Qwen3.7-Plus Stand Out

Hybrid GUI + CLI Agent: Unlike many multimodal models that only understand images, Qwen3.7-Plus can operate a browser through screenshots AND run shell commands — with orchestration baked into the model itself.

ScreenSpot Pro 79.0: This is the key differentiator. It enables pixel-accurate GUI grounding for automating clicks, typing, and navigation from screenshots — directly competitive with Anthropic's Claude Computer Use and OpenAI Operator.

1M-Token Context Window: Tops 91% of models on the market, enabling long-document and multi-turn agentic sessions.

Tool Use & Function Calling: Full support for MCP-style tool orchestration and function calling workflows.

---

Strengths

✅ Excellent agentic performance — #15 globally for agentic tasks
✅ Top-tier GUI grounding — ScreenSpot Pro 79.0 leads open-API field
✅ Very competitive pricing for its capability tier ($0.32/M input)
✅ Huge 1M context window
✅ Strong coding ability (top quartile)
✅ Versatile input modalities (text, image, video)
✅ Hybrid GUI + CLI agent baked in

Weaknesses

❌ Notably slow (~50 tok/s) — among the slowest in its price class
❌ Very verbose — generates significantly more tokens than average, which can increase latency and cost
❌ Proprietary license — no open weights for commercial use (restricted)
❌ No image generation — strictly text output only
❌ Expensive relative to peers on output tokens — $1.16–$1.28/M output

---

How It Compares to Qwen3.7-Max

| Aspect | Qwen3.7-Max | Qwen3.7-Plus |
|---|---|---|
| Modality | Text only | Multimodal (text + vision) |
| Best for | Long-horizon coding agents, reasoning | GUI agents, browser automation, UI tasks |
| Context | 1M tokens | 1M tokens |
| Pricing | $2.50/$7.50 per M (in/out) | $0.32/$1.28 per M (in/out) |
| Intelligence Rank | Higher (flagship text) | Lower but complementary |

The two are complementary, not competing — Max handles reasoning & code, Plus handles screen perception & GUI actions.

---

Bottom Line

Qwen3.7-Plus is best for teams building:
GUI/browser automation agents
Computer-use tools (click, type, navigate from screenshots)
UI testing and screen-scraping pipelines
Multimodal document processing with visual context
Productivity automation workflows

It's NOT ideal for:
Pure text/code-only tasks (use Qwen3.7-Max instead)
Latency-sensitive applications (it's slow)
Scenarios requiring open-weight models or on-prem deployment

Verdict: At #8 in intelligence in its class and leading the open-API GUI agent category on ScreenSpot Pro, Qwen3.7-Plus is a strong multimodal contender. Its hybrid GUI+CLI approach, 1M context window, and aggressive pricing make it a compelling option for agentic workflows — especially for teams that route between Max (reasoning) and Plus (vision/actions) in a stacked agent architecture.

Sources visited

Ask a follow-up

Sources visited

Related questions

Ask a follow-up