Retrieval-Augmented Generation (RAG) pairs a retriever with a generator. At query time the retriever finds the most relevant chunks — usually via a vector database and embeddings — and the language model answers using those chunks as context. This grounds responses in current, private, or domain-specific data the model never saw in training, and lets you cite sources.
A typical open-source RAG stack combines an embedding model, a vector store, a chunking/ingestion pipeline, and an orchestration layer. RAG is the most common way teams add company knowledge to an LLM without fine-tuning.