LoRA freezes the base model's weights and learns small low-rank update matrices, so the trained artifact is tiny and can be swapped or stacked on top of the base. This brought fine-tuning within reach of a single GPU and made custom models practical for small teams.
Open-source toolchains support LoRA out of the box, and QLoRA adds quantization so even large base models can be adapted on modest hardware.