Quantization is what lets large open models run on modest hardware. By representing weights with fewer bits, a model that needed a server can fit on a consumer GPU or a laptop. Formats like GGUF, GPTQ, and AWQ package quantized weights for different runtimes.
The trade-off is size and speed versus accuracy: aggressive quantization saves the most memory but can degrade quality, so runtimes offer several levels. It is a core enabling technique for the local-LLM ecosystem.