# Qwen/Qwen3.7-Plus Review (June 2026)
## Overview
**Qwen3.7-Plus** is Alibaba Cloud Qwen Team's flagship multimodal agent model, released GA in **June 2026** (Preview launched May 20, 2026). It builds on the Qwen3.7 text backbone and adds vision, screen understanding, and GUI action capabilities — completing the Qwen3.7 lineup alongside the text-only **Qwen3.7-Max**.
It's described as a **"multimodal interactive hybrid agent"** — meaning it handles both GUI environments (browser, desktop apps via screenshots) and CLI environments (terminal, code execution) in unified observe-act-observe loops.
---
## Key Specifications
| Spec | Value |
|---|---|
| **Release Date** | June 3, 2026 (GA) |
| **Context Window** | **1,000,000 tokens (1M)** |
| **Max Output** | 65,536 tokens |
| **Input Modalities** | Text, Image, Video |
| **Output Modality** | Text (no image/video generation) |
| **Reasoning** | Yes |
| **License** | Proprietary |
---
## Pricing
| Token Type | Cost (per 1M tokens) |
|---|---|
| **Input** | **$0.32 – $0.40** |
| **Output** | **$1.16 – $1.28** |
| **Cached Input** | $0.08 (80% discount) |
Available via Together (primary API provider) and Alibaba Cloud.
---
## Performance & Benchmarks
### Intelligence (Artificial Analysis — #8 out of 162 in class)
- **Artificial Analysis Intelligence Index: 39** (well above average of 16)
- **GPQA Diamond: 90%** — graduate-level scientific reasoning
- **Humanity's Last Exam (HLE): 33.4%**
- **SciCode: 45.5%** — scientific computing
- **τ²-Bench: 93%** — conversational agent benchmark
- **IFBench: 78%** — instruction following
- **LCR: 65%** — long-context reasoning
### GUI / Agentic Benchmarks
- **ScreenSpot Pro: 79.0** — GUI grounding (pixel-level click positioning on screenshots). This is the headline number, placing it at the **front of the open-API GUI agent field**, competitive with Claude Computer Use and OpenAI Operator.
- **Terminal-Bench: 70.3** — agentic terminal task performance
- **TerminalBench Hard: 47%**
- **Coding Index: 55.9** (#27 of 139 models)
- **Agentic Index: 70** (#15 of 293 models)
### Speed & Efficiency
- **Throughput: ~50 tokens/second** (notably slow — ranked #105/162)
- **Latency P95: ~69.5 seconds** on Together
- **Very verbose** — generates ~110M tokens in evaluation (vs. 58M avg)
---
## What Makes Qwen3.7-Plus Stand Out
1. **Hybrid GUI + CLI Agent**: Unlike many multimodal models that only understand images, Qwen3.7-Plus can operate a browser through screenshots AND run shell commands — with orchestration baked into the model itself.
2. **ScreenSpot Pro 79.0**: This is the key differentiator. It enables pixel-accurate GUI grounding for automating clicks, typing, and navigation from screenshots — directly competitive with Anthropic's Claude Computer Use and OpenAI Operator.
3. **1M-Token Context Window**: Tops 91% of models on the market, enabling long-document and multi-turn agentic sessions.
4. **Tool Use & Function Calling**: Full support for MCP-style tool orchestration and function calling workflows.
---
## Strengths
- ✅ **Excellent agentic performance** — #15 globally for agentic tasks
- ✅ **Top-tier GUI grounding** — ScreenSpot Pro 79.0 leads open-API field
- ✅ **Very competitive pricing** for its capability tier ($0.32/M input)
- ✅ **Huge 1M context window**
- ✅ **Strong coding ability** (top quartile)
- ✅ **Versatile input modalities** (text, image, video)
- ✅ **Hybrid GUI + CLI agent baked in**
## Weaknesses
- ❌ **Notably slow** (~50 tok/s) — among the slowest in its price class
- ❌ **Very verbose** — generates significantly more tokens than average, which can increase latency and cost
- ❌ **Proprietary license** — no open weights for commercial use (restricted)
- ❌ **No image generation** — strictly text output only
- ❌ **Expensive relative to peers on output tokens** — $1.16–$1.28/M output
---
## How It Compares to Qwen3.7-Max
| Aspect | Qwen3.7-Max | Qwen3.7-Plus |
|---|---|---|
| Modality | Text only | Multimodal (text + vision) |
| Best for | Long-horizon coding agents, reasoning | GUI agents, browser automation, UI tasks |
| Context | 1M tokens | 1M tokens |
| Pricing | $2.50/$7.50 per M (in/out) | $0.32/$1.28 per M (in/out) |
| Intelligence Rank | Higher (flagship text) | Lower but complementary |
The two are **complementary, not competing** — Max handles reasoning & code, Plus handles screen perception & GUI actions.
---
## Bottom Line
**Qwen3.7-Plus is best for teams building:**
- GUI/browser automation agents
- Computer-use tools (click, type, navigate from screenshots)
- UI testing and screen-scraping pipelines
- Multimodal document processing with visual context
- Productivity automation workflows
**It's NOT ideal for:**
- Pure text/code-only tasks (use Qwen3.7-Max instead)
- Latency-sensitive applications (it's slow)
- Scenarios requiring open-weight models or on-prem deployment
**Verdict:** At #8 in intelligence in its class and leading the open-API GUI agent category on ScreenSpot Pro, Qwen3.7-Plus is a strong multimodal contender. Its hybrid GUI+CLI approach, 1M context window, and aggressive pricing make it a compelling option for agentic workflows — especially for teams that route between Max (reasoning) and Plus (vision/actions) in a stacked agent architecture.
1searchQwen Qwen3.7-Plus review
2navigatehttps://www.buildfastwithai.com/blogs/qwen-3-7-plus-multimodal-agent-review-2026
3navigatehttps://designforonline.com/ai-models/qwen-qwen3-7-plus/
4navigatehttps://llm-stats.com/models/qwen3.7-plus
5navigatehttps://artificialanalysis.ai/models/qwen3-7-plus