Local LLM · Category

Trending Local LLM repositories

Two years ago, running a 7B model on a laptop meant compiling llama.cpp, hand-quantising weights, and tolerating single-digit tokens per second on CPU. Today it's a one-line install with GPU acceleration, structured output, function calling, and an OpenAI-compatible HTTP server bundled in. The gap closed fast and most builders haven't fully internalised it. This leaderboard tracks the full stack: inference engines (llama.cpp, vLLM, MLC, MLX) squeezing every flop out of consumer hardware, model managers (Ollama, LM Studio, GPT4All) putting a friendly install + chat + API on top, and integration layers (LangChain backends, MCP servers, Open WebUI) letting existing app code treat a local model the same as a hosted one. Engine and manager often get adopted as a paired stack. The score weights live mentions across Hacker News, Reddit, X, Bluesky, Product Hunt and Dev.to alongside GitHub velocity, so an actively-discussed project climbs faster than one with a higher absolute star count but flat trend.

By TrendingRepo Editorial · Updated Jul 27, 2026 · 47 repos tracked

Live · top 47 repos · sorted by momentum across 24H

LIVE · 11m

Velocity period:24h 7d 30dClick to re-sort top 47

#	Repository	Stars	24h	7d	30d	Mentions
01	open-webui/open-webui User-friendly AI Interface (Supports Ollama, OpenAI API, ...)	146.8K	+94+0.1%	+736+0.5%	+3.9K+2.7%	live on Hacker News
02	ollama/ollama Get up and running with Kimi-K2.6, GLM-5.2, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.	176.9K	+48+0.0%	+433+0.2%	+2.3K+1.3%	live on Hacker News
03	antirez/ds4 DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm	19.3K	+4+0.0%	+365+1.9%	+3.4K+21.6%	live on Hacker News
04	Mintplex-Labs/anything-llm Stop renting your intelligence. Own it with AnythingLLM. Everything you need for a powerful local-first agent experience	63.9K	+5+0.0%	+286+0.4%	+1.8K+2.9%	live on X / Twitter
05	ggml-org/llama.cpp LLM inference in C/C++	121.7K	—	+608+0.5%	+3.6K+3.0%	live on Hacker News
06	sgl-project/sglang SGLang is a high-performance serving framework for large language models and multimodal models.	30.8K	—	+221+0.7%	+1.2K+4.0%	live on Dev.to
07	mostlygeek/llama-swap Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc	5.2K	+10+0.2%	+81+1.6%	+355+7.4%	live on X / Twitter
08	mozilla-ai/llamafile Distribute and run LLMs with a single file.	25.5K	+8+0.0%	+47+0.2%	+377+1.5%
09	LearningCircuit/local-deep-research ~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Everything Local & Encrypted.	8.8K	+3+0.0%	+32+0.4%	+191+2.2%	live on X / Twitter
10	debpalash/OmniVoice-Studio Local voice clone, video dubbing, dictation and audiobook maker. The open-source ElevenLabs alternative.	9.1K	+1+0.0%	+329+3.7%	+1.4K+18.7%	live on X / Twitter
11	raullenchai/Rapid-MLX The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.	3.4K	+3+0.1%	+115+3.5%	+299+9.6%	live on Bluesky
12	ArvinLovegood/go-stock 🦄🦄🦄AI赋能股票分析：AI加持的股票分析/选股工具。股票行情获取，AI热点资讯分析，AI资金/财务分析，涨跌报警推送。支持A股，港股，美股。支持市场整体/个股情绪分析，AI辅助选股等。数据全部保留在本地。支持DeepSeek，OpenAI， Ollama，LMStudio，AnythingLLM，硅基流动，火山方舟，阿里云百炼等平台或模型。	7K	+7+0.1%	+66+0.9%	+378+5.7%	live on X / Twitter
13	AlexsJones/llmfit Hundreds of models & providers. One command to find what runs on your hardware.	30.7K	+3+0.0%	+899+3.0%	+2K+6.9%	live on X / Twitter
14	itayinbarr/little-coder A harness optimized to smaller LLMs	2K	+1+0.1%	+202+11.3%	+338+20.5%	live on X / Twitter
15	jaeseok614/llm-gpu-checker-ko AI hardware fit calculator for LLM, embedding, reranker, OCR and VLM workloads — VRAM, throughput, licensing and multi-GPU planning	14	+2+16.7%	+14	+14
16	Michael-A-Kuykendall/shimmy ⚡ Pure-Rust WebGPU inference engine — OpenAI-API compatible, GGUF native, runs on any GPU. No Python. No llama.cpp. Single binary.	5.7K	+1+0.0%	+38+0.7%	+177+3.2%	live on X / Twitter
17	dograh-hq/dograh Open source voice AI platform. Self-hosted alternative to Vapi and Retell. On Prem, BYOK across Speech to Speech or LLM/STT/TTS, with a visual workflow builder, MCP native and telephony support.	5K	-8-0.2%	+82+1.7%	+370+7.9%	live on Hacker News
18	mlc-ai/mlc-llm Universal LLM Deployment Engine with ML Compilation	23K	—	+29+0.1%	+151+0.7%
19	avifenesh/bw24 From-scratch Rust+CUDA inference engine, bit-exact by construction — NVFP4, MoE, MTP speculative decoding, tuned against measured limits of one RTX 5090 Laptop (sm_120a).	286	—	+13+4.8%	+283+9433.3%	live on Hacker News
20	ChatGPTNextWeb/NextChat ✨ Light and Fast AI Assistant. Support: Web \| iOS \| MacOS \| Android \| Linux \| Windows	88.6K	+8+0.0%	+42+0.0%	+289+0.3%	live on X / Twitter
21	deepanwadhwa/samosa-chat Run large models like Qwen3.6-35B-A3B locally on a 16 GB RAM machine	45	—	+2+4.7%	+45	live on Hacker News
22	OpenCoworkAI/open-codesign Open-source Claude Design alternative. One-click import your Claude Code / Codex API key. Prompt → prototype / slides / PDF. Multi-model (Claude, GPT, Gemini, Kimi, GLM, Ollama). BYOK, local-first, MIT.	7.5K	+3+0.0%	+183+2.5%	+529+7.6%	live on X / Twitter
23	swellweb/reame CPU-first LLM inference server on llama.cpp. Runs useful models on free-tier ARM boxes; rewriting the input made it ~6x faster and more accurate than tuning the engine. MIT, benchmarks and failures included.	100	—	—	—	live on Hacker News
24	Atrayee-dev/secure-ai-agent-boundary Secure AI Engineering Framework 2026: Data-Boundary Security for Frontier Models	152	—	—	—
25	endend2003-cmd/Tactical-Matrix-Console WarMatrix 2026: Next-Gen Tactical Simulation & AI Command Console	151	—	—	—
26	giannisanni/pulsar SSD-streaming inference engine for giant MoE models (Rust + CUDA). GLM 5.2 743B at 2 tok/s and Hy3 295B at 7 tok/s on two consumer 16GB GPUs. Zero-config multi-GPU: measures PCIe bandwidth, places attention and hot experts where they fit.	67	—	—	—
27	AashishH15/Lexicon The free open source Grammarly alternative offline writing assistant assist with grammar, rewriting & tone entirely on your machine.	49	—	—	—
28	hirokawaguchi/open-genai デジタル庁のガバメントAI「源内(GENAI)」を完全ローカル(ローカルLLM/OpenAI互換)で動かす非公式プロジェクト。SAML認証(Keycloak)・RAG(Qdrant)・文字起こし(Whisper)・画像生成(SD)・チーム単位ナレッジをローカル完結。	124	—	—	—
29	clutch-61/hexstrike_augment We have optimized HexStrike by adding a skill system and RAG capabilities. It now also supports connecting to models via Ollama.	36	—	—	—	live on X / Twitter
30	drumih/turbo-fieldfare Gemma 4 26B-A4B inference in ~2 GB of RAM on any M-series MacBook	249	+1+0.4%	—	—	live on Bluesky
31	anthropic-claude-code-ai/free-claude-code-ai-desktop-app claude code ai free desktop app api cli open source opencode aider gemini alternative download github local llm ollama setup guide tutorial api 2026	159	+8+5.3%	+85+114.9%	+160	live on X / Twitter
32	craftingcodegig/datamatic Build multi-step AI workflows with schema-guided reasoning. Supports Ollama, LMStudio, OpenAI, OpenRouter, Gemini, and all latest models for structured generation, chaining, and data processing.	12	—	—	+12
33	7as0nch/mimo2codex 让最新版 OpenAI Codex CLI / Codex 桌面端接入主流大模型的本地代理(新增mac/win包支持，后台运行，开机自动重启)。内置小米 MiMo V2.5/DeepSeek V4 Pro，并提供通用 provider 机制，OpenAI Chat Completions 兼容（Qwen / GLM / Kimi / 本地 vLLM / Ollama / LM Studio …）或原生 Responses API（OpenAI 自家）的上游接到新版 Codex。把 Codex 的 Responses API 实时翻译成上游的 Chat Completions API，按客户端发的 `model` 字段在 provider 之间自动路由.	637	—	+6+1.0%	+40+6.7%
34	mercurialsolo/claudectl Orchestrate a swarm of Claude Code agents with a local brain that learns from you.	193	—	+1+0.5%	+8+4.3%	live on X / Twitter
35	alekk89/llama-cpp-windows-manager Windows desktop console for llama.cpp runtimes, models, and local coding workflows	57	—	+2+3.6%	+11+23.9%	live on X / Twitter
36	Zackriya-Solutions/meetily Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (Meetly Ai - https://meetily.ai) is the #1 Self-hosted, Open-source Ai meeting note taker for macOS & Windows. Understand How to write meeting minutes	26.8K	+1+0.0%	+1K+4.0%	—	live on Hacker News
37	chatboxai/chatbox Powerful AI Client	41.1K	+7+0.0%	+79+0.2%	+513+1.3%	live on X / Twitter
38	mlc-ai/web-llm High-performance In-browser LLM Inference Engine	18.5K	+9+0.0%	+44+0.2%	+196+1.1%	live on X / Twitter
39	oobabooga/textgen Open-source desktop app for local LLMs. Text, vision, tool-calling, OpenAI/Anthropic-compatible API. 100% private.	47.5K	+4+0.0%	+34+0.1%	+170+0.4%
40	haotian-liu/LLaVA [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.	24.9K	+2+0.0%	+19+0.1%	+87+0.3%
41	BaranziniLab/KG_RAG Empower Large Language Models (LLM) using Knowledge Graph based Retrieval-Augmented Generation (KG-RAG) for knowledge intensive tasks	940	—	+1+0.1%	+3+0.3%
42	zilliztech/GPTCache Semantic cache for LLMs. Fully integrated with LangChain and llama_index.	8.1K	+2+0.0%	+13+0.2%	+41+0.5%
43	Graph-COM/SubgraphRAG [ICLR 2025] Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation	185	—	—	+2+1.1%
44	nomic-ai/gpt4all GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.	77.4K	-1-0.0%	+9+0.0%	+111+0.1%	live on Dev.to
45	ollama/ollama-js Ollama JavaScript library	4.3K	+2+0.0%	+8+0.2%	+47+1.1%
46	lss233/kirara-ai 🤖 可 DIY 的多模态 AI 聊天机器人 \| 🚀 快速接入微信、 QQ、Telegram、等聊天平台 \| 🦈支持DeepSeek、Grok、Claude、Ollama、Gemini、OpenAI \| 工作流系统、网页搜索、AI画图、人设调教、虚拟女仆、语音对话 \|	18.9K	+2+0.0%	+14+0.1%	+79+0.4%
47	BeehiveInnovations/pal-mcp-server The power of Claude Code / GeminiCLI / CodexCLI + [Gemini / OpenAI / OpenRouter / Azure / Grok / Ollama / Custom Model / All Of The Above] working as one.	11.7K	—	+6+0.1%	+102+0.9%	live on X / Twitter

Showing 47 of 47 · ranked by momentum, velocity, and consensus

Prev Next

▌ Local LLM — frequently asked questions

What are the best Local LLM projects right now?

As of July 27, 2026, the top-ranked open-source Local LLM projects on TrendingRepo are open-webui/open-webui, ollama/ollama, antirez/ds4, Mintplex-Labs/anything-llm, and ggml-org/llama.cpp — ordered by a cross-source momentum score, not raw star count.

What counts as Local LLM?

Local LLM covers on-device inference engines, local model runtimes, and self-hosted LLM stacks. TrendingRepo classifies each repository automatically from its GitHub topics, name, description and owner.

How does TrendingRepo rank Local LLM repos?

Each repo gets a 0-100 momentum score combining 24h / 7d / 30d star velocity, fork growth, contributor churn, commit freshness and release cadence, with cross-source mention signals layered on top and anti-spam dampening applied.

How many Local LLM repos does TrendingRepo track?

47 projects are currently tracked in the Local LLM category, refreshed continuously as new repos break out across the signal sources.

How often is this list updated?

Roughly every 20 minutes. Automated collectors re-scan GitHub, Hacker News, X, Bluesky, Product Hunt, Dev.to and more, then recompute the rankings — so the leaderboard reflects momentum within the last hour.