RAG solves 2 LLM problems:
- Hallucinations: the model relies on your real docs, not its approximate memory
- Stale knowledge: the model accesses fresh info without retraining
How it works: vectorize your documents, store them in a vector database. When a question comes, find the closest passages semantically and inject them in the LLM prompt.