Two years ago, running a 7B model on a laptop meant compiling llama.cpp, hand-quantising weights, and tolerating single-digit tokens per second on CPU. Today it's a one-line install with GPU acceleration, structured output, function calling, and an OpenAI-compatible HTTP server bundled in. The gap closed fast and most builders haven't fully internalised it. This leaderboard tracks the full stack: inference engines (llama.cpp, vLLM, MLC, MLX) squeezing every flop out of consumer hardware, model managers (Ollama, LM Studio, GPT4All) putting a friendly install + chat + API on top, and integration layers (LangChain backends, MCP servers, Open WebUI) letting existing app code treat a local model the same as a hosted one. Engine and manager often get adopted as a paired stack. The score weights live mentions across Hacker News, Reddit, X, Bluesky, Product Hunt and Dev.to alongside GitHub velocity, so an actively-discussed project climbs faster than one with a higher absolute star count but flat trend.
| # | Repository | Stars | 24h | 7d | 30d | Trend | Mentions | Actions |
|---|---|---|---|---|---|---|---|---|
| 01 | antirez/ds4 DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm | 13.5K | +1+0.0% | +450+3.5% | +5.1K+61.1% | |||
| 02 | chatboxai/chatbox Powerful AI Client | 40.4K | +1+0.0% | +126+0.3% | +592+1.5% | |||
| 03 | mostlygeek/llama-swap Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc | 4.5K | +3+0.1% | +90+2.0% | +484+12.1% | |||
| 04 | raullenchai/Rapid-MLX The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider. | 2.8K | +3+0.1% | +82+3.1% | +560+25.6% | |||
| 05 | debpalash/OmniVoice-Studio The open-source ElevenLabs alternative for local voice cloning, design, create, dubbing and dictation Desktop App | 6.8K | -110-1.6% | +592+9.5% | +6.1K+878.5% | |||
| 06 | dograh-hq/dograh Open source voice AI platform. Self-hosted alternative to Vapi and Retell. On Prem, BYOK across Speech to Speech or LLM/STT/TTS, with a visual workflow builder, MCP native and telephony support. | 4.4K | +1+0.0% | +129+3.1% | +3.9K+832.1% | |||
| 07 | LearningCircuit/local-deep-research ~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & Encrypted. | 8.5K | -144-1.7% | +83+1.0% | +1.1K+14.9% | |||
| 08 | Michael-A-Kuykendall/shimmy ⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary. | 5.4K | +1+0.0% | +71+1.3% | +612+12.8% | |||
| 09 | AlexsJones/llmfit Hundreds of models & providers. One command to find what runs on your hardware. | 27.8K | -23-0.1% | +312+1.1% | +1.9K+7.4% | |||
| 10 | 7as0nch/mimo2codex 让最新版 OpenAI Codex CLI / Codex 桌面端接入主流大模型的本地代理(新增mac/win包支持,后台运行,开机自动重启)。内置 小米 MiMo V2.5/DeepSeek V4 Pro,并提供通用 provider 机制,**OpenAI Chat Completions 兼容**(Qwen / GLM / Kimi / 本地 vLLM / Ollama / LM Studio …)或**原生 Responses API**(OpenAI 自家)的上游接到新版 Codex。把 Codex 的 Responses API 实时翻译成上游的 Chat Completions API,按客户端发的 `model` 字段在 provider 之间自动路由. | 568 | +2+0.4% | +34+6.4% | +473+497.9% | |||
| 11 | ArvinLovegood/go-stock 🦄🦄🦄AI赋能股票分析:AI加持的股票分析/选股工具。股票行情获取,AI热点资讯分析,AI资金/财务分析,涨跌报警推送。支持A股,港股,美股。支持市场整体/个股情绪分析,AI辅助选股等。数据全部保留在本地。支持DeepSeek,OpenAI, Ollama,LMStudio,AnythingLLM,硅基流动,火山方舟,阿里云百炼等平台或模型。 | 6.3K | +1+0.0% | +182+3.0% | +651+11.5% | |||
| 12 | Zackriya-Solutions/meetily Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (Meetly Ai - https://meetily.ai) is the #1 Self-hosted, Open-source Ai meeting note taker for macOS & Windows. | 12.7K | -24-0.2% | +142+1.1% | +714+6.0% | |||
| 13 | OpenCoworkAI/open-codesign Open-source Claude Design alternative. One-click import your Claude Code / Codex API key. Prompt → prototype / slides / PDF. Multi-model (Claude, GPT, Gemini, Kimi, GLM, Ollama). BYOK, local-first, MIT. | 6.8K | -106-1.5% | +126+1.9% | +1K+17.8% | |||
| 14 | itayinbarr/little-coder A harness optimized to smaller LLMs | 1.5K | — | +70+4.9% | +482+47.5% | |||
| 15 | mercurialsolo/claudectl Orchestrate a swarm of Claude Code agents with a local brain that learns from you. | 177 | — | +2+1.1% | +27+18.0% | |||
| 16 | ChatGPTNextWeb/NextChat ✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows | 88.2K | +2+0.0% | +90+0.1% | +412+0.5% | |||
| 17 | alekk89/llama-cpp-windows-manager Windows desktop console for llama.cpp runtimes, models, and local coding workflows | 38 | — | +10+35.7% | +37+3700.0% | |||
| 18 | anthropic-claude-code-ai/free-claude-code-ai-desktop-app claude code ai free desktop app api cli open source opencode aider gemini alternative download github local llm ollama setup guide tutorial api 2026 | 159 | +8+5.3% | +85+114.9% | +160 | |||
| 19 | clutch-61/hexstrike_augment We have optimized HexStrike by adding a skill system and RAG capabilities. It now also supports connecting to models via Ollama. | 35 | — | — | +35 | |||
| 20 | ctx-0/lazyllama a smol tool for managing local models | 19 | — | +1+5.6% | +19 | |||
| 21 | BaranziniLab/KG_RAG Empower Large Language Models (LLM) using Knowledge Graph based Retrieval-Augmented Generation (KG-RAG) for knowledge intensive tasks | 939 | — | — | +1+0.1% | |||
| 22 | BeehiveInnovations/pal-mcp-server The power of Claude Code / GeminiCLI / CodexCLI + [Gemini / OpenAI / OpenRouter / Azure / Grok / Ollama / Custom Model / All Of The Above] working as one. | 11.6K | — | +11+0.1% | +90+0.8% |