Ask HN: What does your local LLM setup looks like?

# Ask HN: What does your local LLM setup look like? Based on recent Hacker News discussions, here's what users are running for their local LLM setups: ## Current Hardware Setups (from the most recent thread) **High-end Gaming Rigs:** - **RTX 5090** - Used for both gaming and running local LLMs with llama server - **RTX 5080** - Running various models for experimentation and agent development **Apple Silicon:** - **Mac mini with M3 chip** - Running Ollama with Qwen models efficiently - **M1 MacBook** - Running local LLMs for development work **Budget/Repurposed Hardware:** - **V100 32G SXM2 adapted to PCIe** - Running llamacpp with quantized models - **Low-end PCs** - Dedicated to Gemma 4 for testing control and development ## Popular Tools & Software Stack **Core Tools:** - **Ollama** - Most popular for running models locally - **llama.cpp** - For efficient inference on various hardware - **Aider** - CLI interface for code assistance - **VSCode with continue.dev extension** - Local chat and autocomplete - **OpenWebUI** - Chat interface alternative - **ComfyUI** - For Stable Diffusion image generation **Popular Models:** - **Gemma 4** (31B) - Frequently mentioned as impressive - **Qwen 3.6** (27B) - Good performance, especially on Apple Silicon - **Llama 3.1** - General purpose chat - **Qwen2.5 Coder** (1.5B) - For autocomplete - **DeepSeek-Coder-v2** (16B) - For coding/editing tasks ## Use Cases & Applications **Development & Coding:** - Explaining coworkers' code and legacy repositories - Code generation and boilerplate creation - Pair programming assistance - Multi-line autocomplete **Automation & Orchestration:** - Personal automation harnesses - AI voice bots running 24/7 - Semantic search engines with memory tracking **Privacy & Offline Use:** - Backup during internet outages - Privacy-sensitive tasks - Local experimentation without cloud dependencies ## Key Insights from Discussions **Cost Considerations:** - Local setups can be cost-effective compared to cloud subscriptions ($135+/month) - Repurposing existing gaming PCs is common - Electricity costs are minimal for modern hardware **Performance Notes:** - Apple Silicon (M3) runs Qwen models particularly well - GPU acceleration significantly improves performance - Quantized models (Q4KM) balance quality and memory usage **Challenges:** - WSL on Windows adds overhead for GPU utilization - Finding truly useful applications beyond demos - Balancing model size with available hardware **Notable Setup:** One user shared a shell script for running Gemma3-27B-it with llama.cpp that forces ASCII output, useful for explaining code repositories systematically. The consensus suggests that local LLM setups are becoming increasingly practical for development work, privacy-conscious tasks, and as backups to cloud services.

Sources visited

Ask your own question

Sources visited

Related questions

Ask your own question