Ask HN: What does your local LLM setup looks like?
# Ask HN: What does your local LLM setup look like?
Based on recent Hacker News discussions, here's what users are running for their local LLM setups:
## Current Hardware Setups (from the most recent thread)
**High-end Gaming Rigs:**
- **RTX 5090** - Used for both gaming and running local LLMs with llama server
- **RTX 5080** - Running various models for experimentation and agent development
**Apple Silicon:**
- **Mac mini with M3 chip** - Running Ollama with Qwen models efficiently
- **M1 MacBook** - Running local LLMs for development work
**Budget/Repurposed Hardware:**
- **V100 32G SXM2 adapted to PCIe** - Running llamacpp with quantized models
- **Low-end PCs** - Dedicated to Gemma 4 for testing control and development
## Popular Tools & Software Stack
**Core Tools:**
- **Ollama** - Most popular for running models locally
- **llama.cpp** - For efficient inference on various hardware
- **Aider** - CLI interface for code assistance
- **VSCode with continue.dev extension** - Local chat and autocomplete
- **OpenWebUI** - Chat interface alternative
- **ComfyUI** - For Stable Diffusion image generation
**Popular Models:**
- **Gemma 4** (31B) - Frequently mentioned as impressive
- **Qwen 3.6** (27B) - Good performance, especially on Apple Silicon
- **Llama 3.1** - General purpose chat
- **Qwen2.5 Coder** (1.5B) - For autocomplete
- **DeepSeek-Coder-v2** (16B) - For coding/editing tasks
## Use Cases & Applications
**Development & Coding:**
- Explaining coworkers' code and legacy repositories
- Code generation and boilerplate creation
- Pair programming assistance
- Multi-line autocomplete
**Automation & Orchestration:**
- Personal automation harnesses
- AI voice bots running 24/7
- Semantic search engines with memory tracking
**Privacy & Offline Use:**
- Backup during internet outages
- Privacy-sensitive tasks
- Local experimentation without cloud dependencies
## Key Insights from Discussions
**Cost Considerations:**
- Local setups can be cost-effective compared to cloud subscriptions ($135+/month)
- Repurposing existing gaming PCs is common
- Electricity costs are minimal for modern hardware
**Performance Notes:**
- Apple Silicon (M3) runs Qwen models particularly well
- GPU acceleration significantly improves performance
- Quantized models (Q4KM) balance quality and memory usage
**Challenges:**
- WSL on Windows adds overhead for GPU utilization
- Finding truly useful applications beyond demos
- Balancing model size with available hardware
**Notable Setup:**
One user shared a shell script for running Gemma3-27B-it with llama.cpp that forces ASCII output, useful for explaining code repositories systematically.
The consensus suggests that local LLM setups are becoming increasingly practical for development work, privacy-conscious tasks, and as backups to cloud services.