Are thinking efforts implemented with different models underneath or same model with differe
Based on my research, **thinking efforts are implemented using the same model with different parameters, not different models underneath**. Here's what I found:
## Key Findings
**Same Model, Different Parameters:**
- The GitHub Copilot reasoning levels guide explicitly states: "Reasoning levels control how much the model 'thinks' — **same model, different thinking token budgets**"
- OpenAI's reasoning effort parameter controls how much computational budget the model spends "thinking" before responding, using the same underlying model
- The models reason adaptively across reasoning efforts, using fewer tokens for simpler tasks and thinking harder for complex tasks
## How It Works
**Technical Implementation:**
- **Reasoning tokens**: Models generate internal "reasoning tokens" that aren't visible to users but are used for thinking
- **Effort levels**: Parameters like `low`, `medium`, `high`, `xhigh` control how many reasoning tokens are allocated
- **Same model architecture**: The same base model (e.g., GPT-5.5, Claude Opus) handles all effort levels, just with different token budgets
## Evidence from Sources
**OpenAI Documentation:**
- The `reasoning.effort` parameter guides how much to think when performing a task
- Supported values include `none`, `minimal`, `low`, `medium`, `high`, and `xhigh`
- Lower effort favors speed and lower token usage, while higher effort provides more thorough reasoning
**Hacker News Discussion:**
- Community consensus is that thinking efforts use the same model with different parameters
- The effort level is specified in the context after the `<|start|>` tag
- Changing effort mid-conversation can break cache because it modifies the system prompt
## Why This Matters
**Cache Implications:**
- When you change effort levels mid-conversation in Claude, you get a warning about breaking the cache
- This happens because the effort level is typically inserted at the start of the conversation context
- The same model is used, but the context/prompt changes, invalidating cached responses
**Cost and Performance:**
- Higher effort = more reasoning tokens = higher cost and latency
- Lower effort = fewer reasoning tokens = faster and cheaper responses
- All using the same underlying model architecture
## What's Missing
I couldn't find definitive confirmation from Anthropic's official documentation about Claude's specific implementation, but the community discussion and OpenAI's documented approach strongly suggest the same pattern applies across major AI providers.
**Bottom line**: Thinking efforts are a parameter tuning mechanism on the same model, not separate models underneath.