A local LLM runs inference on hardware you control. With quantization and efficient runtimes, capable open models now run on a laptop or a single GPU. Teams choose local for data privacy, offline operation, predictable cost, and no per-token API fees.
The open-source local-LLM ecosystem covers inference engines, model runners with one-command setup, and UIs — the layer that turns raw model weights into something you can actually chat with or build on.