Are thinking efforts implemented with different models underneath or same model with differe

Based on my research, **thinking efforts are implemented using the same model with different parameters, not different models underneath**. Here's what I found: ## Key Findings **Same Model, Different Parameters:** - The GitHub Copilot reasoning levels guide explicitly states: "Reasoning levels control how much the model 'thinks' — **same model, different thinking token budgets**" - OpenAI's reasoning effort parameter controls how much computational budget the model spends "thinking" before responding, using the same underlying model - The models reason adaptively across reasoning efforts, using fewer tokens for simpler tasks and thinking harder for complex tasks ## How It Works **Technical Implementation:** - **Reasoning tokens**: Models generate internal "reasoning tokens" that aren't visible to users but are used for thinking - **Effort levels**: Parameters like `low`, `medium`, `high`, `xhigh` control how many reasoning tokens are allocated - **Same model architecture**: The same base model (e.g., GPT-5.5, Claude Opus) handles all effort levels, just with different token budgets ## Evidence from Sources **OpenAI Documentation:** - The `reasoning.effort` parameter guides how much to think when performing a task - Supported values include `none`, `minimal`, `low`, `medium`, `high`, and `xhigh` - Lower effort favors speed and lower token usage, while higher effort provides more thorough reasoning **Hacker News Discussion:** - Community consensus is that thinking efforts use the same model with different parameters - The effort level is specified in the context after the `<|start|>` tag - Changing effort mid-conversation can break cache because it modifies the system prompt ## Why This Matters **Cache Implications:** - When you change effort levels mid-conversation in Claude, you get a warning about breaking the cache - This happens because the effort level is typically inserted at the start of the conversation context - The same model is used, but the context/prompt changes, invalidating cached responses **Cost and Performance:** - Higher effort = more reasoning tokens = higher cost and latency - Lower effort = fewer reasoning tokens = faster and cheaper responses - All using the same underlying model architecture ## What's Missing I couldn't find definitive confirmation from Anthropic's official documentation about Claude's specific implementation, but the community discussion and OpenAI's documented approach strongly suggest the same pattern applies across major AI providers. **Bottom line**: Thinking efforts are a parameter tuning mechanism on the same model, not separate models underneath.

1search"thinking efforts" AI models reasoning

2navigatehttps://news.ycombinator.com/item?id=48434240

3get_text

4navigatehttps://github.com/openai/harmony

5searchhow are thinking efforts implemented in Claude ChatGPT reasoning models

6navigatehttps://platform.claude.com/docs/en/build-with-claude/effort

7get_text

8searchClaude effort parameter same model different reasoning levels technical implementation

9navigatehttps://www.mindstudio.ai/blog/claude-opus-4-8-effort-levels-explained

10get_text

11clickundefined

12navigatehttps://news.ycombinator.com/item?id=48434240

13get_text

14searchOpenAI reasoning effort levels implementation same model different parameters

15navigatehttps://developers.openai.com/api/docs/guides/reasoning

16searchare thinking efforts different models or same model with different parameters reasoning levels

Sources visited

Ask your own question

Sources visited

Related questions

Ask your own question