The three approaches at a glance
Retrieval-Augmented Generation (RAG): the model receives relevant content alongside the query and uses it to answer. The model itself does not change; the context the model sees does. Cheap, fast, easy to update (change the documents, the answers change).
Fine-tuning: the model itself is retrained on a dataset of examples to learn a behaviour or style. The base model becomes a customised model. More expensive, slower to iterate, but capable of teaching the model patterns that prompting cannot reliably elicit.
Tool use (sometimes called function calling or agents): the model can call external functions to fetch live data, run computations, or take actions. The knowledge or capability is not in the model; it is in the tools the model can invoke.
Each approach is good at something the others are bad at. Confusing them is the source of most architecture mistakes we audit.
When RAG wins
When the task is grounding answers in proprietary content the model has not seen. Knowledge bases, internal documentation, support content, product specifications, legal or compliance documents. If the answer should come from your content, RAG is almost always the right approach.
When the content changes. RAG updates by re-indexing documents. Fine-tuning updates by retraining, which is expensive and slow. If the content updates more than every quarter, RAG is structurally better.
When you need citations. RAG can return the source documents alongside the answer, which makes verification possible. Fine-tuning produces answers without provenance. In regulated or high-trust contexts, citations are not optional.
When you want fast iteration. Adding a new document to a RAG system takes minutes. Adding new behaviour to a fine-tuned model takes a training run.
When fine-tuning wins
When you need a specific output format or style that prompting cannot reliably produce. Specialised summarisation styles, domain-specific terminology, structured output that has to match an exact schema even on edge cases.
When latency matters and prompting your way to the right behaviour requires a long prompt. Fine-tuning can move the behaviour into the model, allowing shorter prompts and faster responses.
When the task is narrow, well-defined, and high-volume. The economics favour fine-tuning when the behaviour will be invoked many times and the cost of teaching the model the behaviour amortises across all those invocations.
When you want to deploy a smaller, cheaper model on a task a larger model handles via prompting. A fine-tuned smaller model can match or beat a generic larger model on a narrow task at a fraction of the inference cost.
When tools and agents win
When the AI needs to take actions in the world. Sending an email, updating a record, scheduling a meeting, processing a payment. These are not knowledge problems; they are capability problems, and tools are how LLMs get capabilities.
When the answer requires fresh data the model cannot have memorised. Today's stock price, current inventory levels, the user's most recent order. RAG can handle this if the data is in a document store, but a tool call is more direct.
When the task requires multi-step reasoning across systems. An agent that researches a question by querying multiple data sources, synthesises the answers, and produces a result. This is the hardest pattern to get right but the most powerful when it works.
When determinism matters for parts of the task. Calculations, data lookups, and rule applications should be tools, not prompts. Models are bad at arithmetic; calculators are good at arithmetic. Use the right tool for each step.
The decision tree
Start with one question: what does the AI need to do that it cannot already do.
If the gap is knowledge (it does not know your content), start with RAG. RAG handles most knowledge gaps and is cheap to iterate on. Reach for fine-tuning only if RAG is failing despite good retrieval.
If the gap is behaviour (it does not produce the right format, style, or output structure consistently), try sophisticated prompting first, then RAG with examples in the context, and only then fine-tuning. Fine-tuning is rarely the first answer; it is the answer when the cheaper options have been exhausted.
If the gap is capability (it cannot take actions or fetch live data), use tools. There is no way around this. RAG and fine-tuning will not help.
If the gap is multi-step reasoning across systems, you are looking at agents. Agents combine tools with multi-turn LLM reasoning. They are the most powerful and the hardest to make reliable.
Hybrid patterns that work
RAG plus tools. The model retrieves knowledge from documents and also calls tools for live data. Most production knowledge-base applications work this way: documents for stable knowledge, tools for live signals.
Fine-tuned model plus RAG. A model fine-tuned for a specific output style or task, layered with retrieval for the content that informs each answer. Common in production support and sales applications where both behaviour and knowledge need to be customised.
Multi-stage pipelines with each stage choosing its own approach. A classifier (fine-tuned for speed) decides what kind of query came in. A retrieval stage (RAG) gathers relevant content. A generator (general model with tools) produces the answer. Each stage uses the right approach for its job.
The pattern to avoid is using all three on a problem that only needs one. Complexity has a cost; reach for hybrid only when the simpler approach is genuinely insufficient.
Common mistakes
Reaching for fine-tuning first because it sounds powerful. Fine-tuning has a high build and maintenance cost, and most behaviours that teams want to fine-tune for can be achieved with better prompting plus RAG.
Over-investing in agents before the foundation is solid. Multi-step agentic systems are seductive in demos and brittle in production. Build a working RAG-plus-tools system first, validate that it solves your problem, then consider whether agents add enough value to justify the additional complexity.
Using RAG as a search engine. RAG is generation grounded in retrieval. If your query is 'find me documents about X', what you want is search, not RAG. Putting an LLM in front of search adds latency and cost without adding value.
Confusing fine-tuning with custom training. Fine-tuning adapts a pre-trained model to a narrow task. Training a model from scratch is a different (much more expensive) thing that almost no production teams should be doing.