Local LLM tools encompass inference engines, quantization frameworks, and self-hosted runtimes that execute large language models entirely on consumer or enterprise hardware without external API dependency. A strong project in this category distinguishes itself through broad model format support—particularly GGUF, ONNX, and Safetensors—efficient memory management for constrained VRAM, and hardware acceleration across NVIDIA, AMD, Apple Silicon, and CPU backends. Developers should evaluate quantization quality against benchmark perplexity scores, context window scalability, batch throughput for concurrent requests, and the maturity of the server implementation for OpenAI-compatible API compatibility. Equally critical is the project's update cadence for new architecture support, since model releases frequently outpace runtime adaptation. 15 projects qualified as of May 28, 2026.
The open-source ElevenLabs alternative for local voice cloning, design, create, dubbing …
~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.…
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vl…
Auto pilot for Claude Code - connect multiple coding agents to a local LLM brain. 🆕 with a hive mind now
The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.
🦄🦄🦄AI赋能股票分析:AI加持的股票分析/选股工具。股票行情获取,AI热点资讯分析,AI资金/财务分析,涨跌报警推送。支持A股,港股,美股。支持市场整体/个股情绪分析,AI辅助选股等。数据全部保留在本地。支持DeepSeek,OpenAI, Ollama,LMStudio,AnythingLLM,硅基流动,火山方舟,阿里云百炼等平台或模型。
让最新版 OpenAI Codex CLI / Codex 桌面端接入主流大模型的本地代理(新增mac/win包支持,后台运行,开机自动重启)。内置 小米 MiMo V2.5/DeepSeek V4 Pro,并提供通用 provider 机制,**OpenAI Chat Completions 兼容**(Qwen / GLM / Kimi / 本地 vLLM / Ollama / LM Studio …)或**原生 Responses API**(OpenAI 自家)的上游接到新版 Codex。把 Codex 的 Responses API 实时翻译成上游的 Chat Completions API,按客户端发的 `model` 字段在 provider 之间自动路由.
claude code ai free desktop app api cli open source opencode aider gemini alternative download github local llm ollama setup guide tutorial api 2026
Windows desktop console for llama.cpp runtimes, models, and local coding workflows
Hundreds of models & providers. One command to find what runs on your hardware.
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot mod…
Windows desktop console for llama.cpp runtimes, models, and local coding workflows