Best of · Open source

Best Local LLM Tools

Local LLM tools encompass inference engines, quantization frameworks, and self-hosted runtimes that execute large language models entirely on consumer or enterprise hardware without external API dependency. A strong project in this category distinguishes itself through broad model format support—particularly GGUF, ONNX, and Safetensors—efficient memory management for constrained VRAM, and hardware acceleration across NVIDIA, AMD, Apple Silicon, and CPU backends. Developers should evaluate quantization quality against benchmark perplexity scores, context window scalability, batch throughput for concurrent requests, and the maturity of the server implementation for OpenAI-compatible API compatibility. Equally critical is the project's update cadence for new architecture support, since model releases frequently outpace runtime adaptation. 15 projects qualified as of May 28, 2026.

Updated May 28, 2026 · 15 projects ranked

antirez/ds4
DeepSeek 4 Flash local inference engine for Metal and CUDA
★ 12.3K+13 · 24hmomentum 13C
debpalash/OmniVoice-Studio
The open-source ElevenLabs alternative for local voice cloning, design, create, dubbing …
★ 5.2K+13 · 24hmomentum 13Python
LearningCircuit/local-deep-research
~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.…
★ 8.1K+1 · 24hmomentum 11Python
dograh-hq/dograh
Open Source Voice Agent Platform
★ 3.6K+1 · 24hmomentum 9Python
mostlygeek/llama-swap
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vl…
★ 4.3K-6 · 24hmomentum 8Go
mercurialsolo/claudectl
Auto pilot for Claude Code - connect multiple coding agents to a local LLM brain. 🆕 with a hive mind now
★ 1730 · 24hmomentum 8Rust
raullenchai/Rapid-MLX
The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.
★ 2.6K0 · 24hmomentum 8Python
ArvinLovegood/go-stock
🦄🦄🦄AI赋能股票分析：AI加持的股票分析/选股工具。股票行情获取，AI热点资讯分析，AI资金/财务分析，涨跌报警推送。支持A股，港股，美股。支持市场整体/个股情绪分析，AI辅助选股等。数据全部保留在本地。支持DeepSeek，OpenAI， Ollama，LMStudio，AnythingLLM，硅基流动，火山方舟，阿里云百炼等平台或模型。
★ 5.9K0 · 24hmomentum 8Go
7as0nch/mimo2codex
让最新版 OpenAI Codex CLI / Codex 桌面端接入主流大模型的本地代理(新增mac/win包支持，后台运行，开机自动重启)。内置小米 MiMo V2.5/DeepSeek V4 Pro，并提供通用 provider 机制，**OpenAI Chat Completions 兼容**（Qwen / GLM / Kimi / 本地 vLLM / Ollama / LM Studio …）或**原生 Responses API**（OpenAI 自家）的上游接到新版 Codex。把 Codex 的 Responses API 实时翻译成上游的 Chat Completions API，按客户端发的 `model` 字段在 provider 之间自动路由.
★ 4410 · 24hmomentum 7TypeScript
anthropic-claude-code-ai/free-claude-code-ai-desktop-app
claude code ai free desktop app api cli open source opencode aider gemini alternative download github local llm ollama setup guide tutorial api 2026
★ 670 · 24hmomentum 7C#
alekk89/llama.cpp-Console
Windows desktop console for llama.cpp runtimes, models, and local coding workflows
★ 150 · 24hmomentum 7C#
AlexsJones/llmfit
Hundreds of models & providers. One command to find what runs on your hardware.
★ 26.8K-8 · 24hmomentum 7Rust
Michael-A-Kuykendall/shimmy
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot mod…
★ 5.3K-48 · 24hmomentum 7Rust
alekk89/llama-cpp-windows-manager
Windows desktop console for llama.cpp runtimes, models, and local coding workflows
★ 220 · 24hmomentum 6C#
ctx-0/lazyllama
a smol tool for managing local models
★ 140 · 24hmomentum 6Python

▌ Best Local LLM Tools — FAQ

What are the best local llm tools?

As of May 28, 2026, the top-ranked are antirez/ds4, debpalash/OmniVoice-Studio, LearningCircuit/local-deep-research, dograh-hq/dograh and mostlygeek/llama-swap — ordered by TrendingRepo's cross-source momentum score across GitHub, Hacker News, Reddit, X, Bluesky, Product Hunt and Dev.to.

How does TrendingRepo choose this list?

We filter our tracked open-source index to on-device inference engines and self-hosted runtimes for running models locally, then rank by a 0-100 momentum score combining 24h / 7d / 30d star velocity, fork growth, contributor churn, commit freshness and release cadence, with cross-source mention signals and anti-spam dampening on top.

Are these all free and open source?

Yes. Every project listed is an open-source repository on GitHub with a public license — TrendingRepo only ranks open-source code.

How often is this list updated?

Roughly every 20 minutes. Collectors re-scan the signal sources and recompute the rankings, so the list reflects momentum within the last hour.

Best Local LLM Tools

Updated May 28, 2026 · 15 projects ranked