Prompt Engineering Vs RAG Vs Finetuning Explained Easily

TL;DR

Prompt engineering improves outputs by refining the instructions given to a pre-trained LLM, often using clearer roles, constraints, and structured output requests.

Briefing Cornell Notes

Briefing

The clearest way to choose between prompt engineering, RAG, and fine-tuning is to match the technique to where the needed knowledge should come from: instructions you write, documents you retrieve, or behavior you train into the model. Prompt engineering improves answers by crafting clearer, more structured instructions for an existing pre-trained LLM. RAG (retrieval-augmented generation) improves answers by pulling relevant, up-to-date external information from a vector database before generating a response. Fine-tuning improves answers by updating the model’s internal weights using your own data so the assistant consistently behaves the way you want.

Prompt engineering is essentially “steering” a pre-trained model through the prompt. In the cat example, a robot gives generic information when asked broadly, but produces more targeted and useful output when the question becomes more specific—like asking for something funny about cats. The same idea scales to real applications: instead of changing the model, developers refine the input instructions, such as telling the model to act as a teacher, answer in detail, and return structured, point-wise output. The benefit is speed and low operational overhead: it leverages the model’s existing capabilities and focuses on exploring what the pre-trained LLM can already do with better guidance.

RAG adds a “backpack” of external knowledge. When the user asks a question, the system converts the query into vectors, performs vector similarity search against a vector database, retrieves the most relevant context (stored as vector embeddings), and then combines that context with the prompt so the LLM can generate a grounded answer. This is especially valuable for companies that need access to proprietary or frequently updated information—like internal policies or leave policies that change over time. If the knowledge isn’t in the model’s training data, RAG can still answer by retrieving the latest documents. The tradeoff is cost and complexity: the system must query the database and fetch context for each request.

Fine-tuning is different: it changes the model itself. Using the Jarvis-to-Tom analogy, the assistant is trained to respond in a way that fits a specific person’s preferences and context. Practically, developers start with a pre-trained LLM (often a transformer-based model) and then train it further on organization-specific data. That training updates the model’s weights, so the assistant’s behavior becomes aligned with the desired style and knowledge. Fine-tuning can make an AI assistant feel consistently “on-brand,” from greetings to domain-specific responses, but it comes with major challenges: high training cost (including GPU usage), expensive data preparation, and the need to redeploy and manage the resulting model.

Cost and effort also shape the decision: prompt engineering requires iterative prompt testing and stakeholder approval; RAG requires maintaining and querying external knowledge stores; fine-tuning requires expensive training and ongoing model operations. As a rule of thumb, RAG fits when the key requirement is up-to-date, domain-specific information; fine-tuning fits when the assistant must follow organization goals and a consistent interaction style across many inputs; prompt engineering fits when the goal is to extract better performance from an existing pre-trained model through clearer instructions.

Cornell Notes

Prompt engineering, RAG, and fine-tuning are three ways to improve LLM outputs, but they improve different parts of the system. Prompt engineering keeps the model fixed and boosts results by writing clearer, more structured instructions (e.g., asking for a specific kind of answer and requesting point-wise structure). RAG keeps the model fixed but adds retrieval: it converts the user query into vectors, searches a vector database for relevant embedded documents, and uses the retrieved context to generate grounded responses—ideal for proprietary or frequently updated company information. Fine-tuning updates the model’s internal weights using organization-specific data so the assistant consistently behaves in a desired way, but it is costly due to training, data preparation, and redeployment. Choosing the right approach depends on whether knowledge comes from prompts, documents, or training.

How does prompt engineering improve an LLM’s answers without changing the model?

It relies on crafting more precise instructions. In the cat example, a broad question (“tell me something about cats”) yields generic facts, while a more specific prompt (“tell me one funny thing about cats”) produces a more targeted response. The same principle applies to production prompts: developers can instruct the LLM to adopt a role (like acting as a physics or AI teacher), answer in detail, and return structured output (such as point-wise formatting). The core workflow is iterative prompt refinement—testing outputs until a suitable prompt is approved.

What makes RAG different from prompt engineering in how it answers questions?

RAG adds external knowledge retrieval. The system stores documents in a vector database as vector embeddings. When a user asks something, the query is converted into vectors, vector similarity search retrieves the most relevant context, and the LLM generates an answer using that retrieved context combined with the prompt. This is useful when answers depend on up-to-date or proprietary information—like company leave policies that change over time—because the model doesn’t need to have that information in its original training data.

Why is fine-tuning more expensive than prompt engineering and RAG?

Fine-tuning requires training a pre-trained LLM on new data, which changes the model’s weights. That training can demand significant GPU resources and incurs operational overhead: redeploying the updated model to the cloud and managing verification, performance metrics, and ongoing maintenance. It also requires substantial data preparation—without the organization’s data, fine-tuning can’t happen.

When should a team prefer RAG over fine-tuning?

Prefer RAG when the key requirement is access to up-to-date domain-specific information. If the knowledge lives in documents that update over time—like internal policies—RAG can retrieve the latest content from a vector database at query time. This avoids retraining the model every time information changes, though it introduces per-query retrieval cost.

When should a team prefer fine-tuning over RAG?

Prefer fine-tuning when the assistant must behave in a consistent, organization-specific way across many interactions, including greetings and domain responses aligned to business goals. The approach is to train the pre-trained LLM on organization-specific examples so it learns the desired style and behavior. RAG can provide factual grounding, but fine-tuning targets behavioral alignment and consistent interaction patterns.

Review Questions

If a company’s policy documents change weekly, which approach—prompt engineering, RAG, or fine-tuning—best supports accurate answers, and why?
What are the main operational costs introduced by RAG compared with fine-tuning?
How does changing a prompt differ from updating a model’s weights in terms of what gets improved?

Key Points

1
Prompt engineering improves outputs by refining the instructions given to a pre-trained LLM, often using clearer roles, constraints, and structured output requests.
2
RAG improves outputs by retrieving relevant external context from a vector database using vector similarity search, then generating an answer grounded in that retrieved text.
3
Fine-tuning improves outputs by updating a model’s internal weights using organization-specific data so the assistant consistently follows a desired behavior and style.
4
RAG is a strong fit for proprietary or frequently updated information (like evolving company policies) because it can pull the latest documents at query time.
5
Fine-tuning is a strong fit when the assistant must match organization goals and interaction patterns across many inputs, but it requires expensive training and data preparation.
6
Prompt engineering typically involves iterative testing of prompts and stakeholder approval, while RAG and fine-tuning introduce different operational costs (retrieval vs training/redeployment).

Highlights

Prompt engineering steers a pre-trained model through better prompts; RAG grounds answers by retrieving relevant documents; fine-tuning changes the model’s weights to learn desired behavior.

RAG uses vector embeddings and similarity search to fetch context from a vector database before generating a response—ideal for up-to-date company information.

Fine-tuning updates transformer model weights using new data, enabling consistent organization-specific behavior but at high cost and operational overhead.

Topics

Mentioned

Krishna Naik
LLM
RAG
AI
DB