Everything the model 'sees' for a request must fit in the context window: the system prompt, conversation history, any retrieved documents, and the answer it writes. Larger windows allow whole documents or codebases as input, but they cost more and can dilute the model's attention across irrelevant text.
Context windows have grown quickly, yet retrieval-augmented generation is still widely used to keep prompts relevant and cheap even when a large window is available — feeding only the most pertinent chunks rather than everything.