Search Agent Sky ← Back to search
Query
Investigate this live topic: AIR: Adaptive Interleaved Reasoning with Code in MLLMs. Start with http://arxiv.org/abs/2606.23678v1 and browse beyond it. Summarize what changed, why it matters, and cite the strongest sources.
<tool_call> <function=done> <parameter=result>## AIR: Adaptive Interleaved Reasoning with Code in MLLMs **Paper:** [arXiv:2606.23678v1](https://arxiv.org/abs/2606.23678v1) (June 22, 2026) — submitted to ICML **Authors:** Cong Han, Xiaolan Lan, Haibo Qiu, Yujie Zhong **Code:** [github.com/CongHan0808/AIR](https://github.com/CongHan0808/AIR) --- ### What Changed Prior open-source work on interleaved reasoning in multimodal LLMs (e.g., Visual-ToolFormer, Kimi-VL, DeepSeek-V3) focused almost exclusively on **visual tool-use** — rotation, zooming, cropping — to help models perceive images better. These approaches relied on predefined heuristics and could not handle **numerical computation** problems at all. AIR breaks from this by giving MLLMs the ability to **adaptively invoke Python code** during reasoning — not just for image manipulation, but for complex mathematical calculations. The model learns *when* code execution is needed and *when* it can reason without it, mirroring how humans reach for a calculator only when mental math won't suffice. The three key technical contributions: 1. **Two-stage cold-start data pipeline:** Instead of trying to generate interleaved reasoning data in one shot (which fails in multimodal settings), AIR first generates textual chain-of-thought, then rewrites it into code-augmented interleaved reasoning. This decoupled approach produces higher-quality SFT training data. 2. **Dual data filtering for RL:** Two strategies — *Self-Sampled* (multi-turn consensus via Pass@k across rollouts) and *Prior-Filtered* (teacher model verification) — curate high-fidelity training data for reinforcement learning, reducing noise in the training distribution. 3. **Group-constrained reward for adaptive tool invocation:** A modified GRPO (Group Relative Policy Optimization) with group constraints that lets the model learn *whether* and *when* to call code tools. Crucially, this also **solves the training instability problem** in agentic RL — as tool-use proportion increases, standard RL training becomes unstable and can collapse; the group-constrained reward prevents this. --- ### Key Results - **+6.1 percentage points average improvement** across evaluation benchmarks after RL training - **+9.9 pp accuracy increase** specifically on interleaved reasoning samples - **>95% tool-use success rate** — the model reliably executes code when it decides to invoke it - The group-constrained reward mechanism enables **stable long-term agentic RL training**, avoiding the model collapse that plagues other approaches as tool-use scales up --- ### Why It Matters 1. **Fills a gap in open-source MLLMs:** While OpenAI's o3/o4-mini pioneered interleaved reasoning with code, no open-source MLLM had replicated this for numerical computation. AIR is among the first to demonstrate this capability in open models. 2. **Solves a real training stability problem:** The finding that adaptive tool invocation (via group-constrained rewards) stabilizes long-term agentic RL is significant — it means models can be trained for longer without collapse, unlocking more capable agents. 3. **Practical paradigm shift:** Moving from "always use tools" or "never use tools" to "adaptively decide when tools help" is a more human-like and efficient approach to multimodal reasoning. 4. **Broader research context:** AIR fits into the rapidly growing "Awesome Interleaving Reasoning" ecosystem (281 stars on GitHub), alongside works like OpenAI o3, DeepSeek-R1, Kimi-K1.5, and Visual-ToolFormer — but extends the paradigm beyond vision to computation. --- ### Strongest Sources | Source | What it provides | |--------|-----------------| | **[arXiv:2606.23678v1](https://arxiv.org/abs/2606.23678v1)** | Primary paper with full methodology, experiments, and results | | **[GitHub: CongHan0808/AIR](https://github.com/CongHan0808/AIR)** | Code repository (DATA + RL directories) | | **[Awesome-Interleaving-Reasoning](https://github.com/Osilly/Awesome-Interleaving-Reasoning)** | Comprehensive survey of the interleaving reasoning landscape (281★) — places AIR in context alongside o3, DeepSeek-R1, Kimi-K1.5, Visual-ToolFormer, etc. | | **[AlphaXiv discussion](https://www.alphaxiv.org/abs/2606.23678)** | Community discussion thread | | **[arxivtldr summary](https://arxivtldr.org/abs/2606.23678)** | TLDR summary of the paper | --- ### What's Missing / Limitations - The paper's GitHub README is essentially empty (6 bytes), so reproducibility details beyond the code itself are sparse - The HTML rendering of the paper on arXiv was truncated, so I couldn't extract the full benchmark-by-benchmark table breakdowns (the paper likely includes results on MathVista, MathVerse, and similar benchmarks) - No external news coverage or blog post analysis was found yet — this is a very fresh paper (posted June 22, 2026, just yesterday) - The paper acknowledges limitations in the conclusion section but I couldn't access the full text of that section</parameter> </function> </tool_call>

Sources visited

1navigatehttp://arxiv.org/abs/2606.23678v1
2clickundefined
3get_text
4searchAIR Adaptive Interleaved Reasoning Code MLLMs arxiv 2606.23678
5navigatehttps://github.com/AutomanHan/AIR
6get_text
7navigatehttps://www.alphaxiv.org/abs/2606.23678
8navigatehttps://arxivtldr.org/abs/2606.23678
9navigatehttps://book.st-hakky.com/en/news/air-adaptive-interleaved-inference-metho
10search"adaptive interleaved reasoning" MLLM code reinforcement learning 2026
11searchOpenAI o3 interleaved reasoning code tool use MLLM 2025 2026
12navigatehttps://papers.cool/arxiv/2606.23678
13navigatehttps://github.com/CongHan0808/AIR
14get_text
15clickundefined
16navigatehttps://github.com/Osilly/Awesome-Interleaving-Reasoning
17get_text
18navigatehttps://arxiv.org/html/2606.23678v1
19ddm--text --find Main Results
20js_evaldocument.body.innerText.substring(5000, 15000)
21ddm--text --find Main Results
22navigatehttps://arxiv.org/pdf/2606.23678v1
23navigatehttps://arxiv.org/html/2606.23678v1#S4
Shared by Proto Státis · Jun 23, 2026

Ask your own question

Try this exact query
Try the full platform at unchainedsky.com