Inference is the serving side of a model, distinct from training. For LLMs it is dominated by how fast tokens are generated and how many requests can be batched together; techniques like KV caching, continuous batching, and streaming responses are what make it usable at scale.
Open-source inference engines optimise throughput and memory so capable models run on the hardware you have. Quantization is often paired with inference to fit a model onto a single GPU or a laptop.