Prompt Engineering Vs RAG Vs Finetuning Explained Easily
Based on Krish Naik's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Prompt engineering improves outputs by refining the instructions given to a pre-trained LLM, often using clearer roles, constraints, and structured output requests.
Briefing
The clearest way to choose between prompt engineering, RAG, and fine-tuning is to match the technique to where the needed knowledge should come from: instructions you write, documents you retrieve, or behavior you train into the model. Prompt engineering improves answers by crafting clearer, more structured instructions for an existing pre-trained LLM. RAG (retrieval-augmented generation) improves answers by pulling relevant, up-to-date external information from a vector database before generating a response. Fine-tuning improves answers by updating the model’s internal weights using your own data so the assistant consistently behaves the way you want.
Prompt engineering is essentially “steering” a pre-trained model through the prompt. In the cat example, a robot gives generic information when asked broadly, but produces more targeted and useful output when the question becomes more specific—like asking for something funny about cats. The same idea scales to real applications: instead of changing the model, developers refine the input instructions, such as telling the model to act as a teacher, answer in detail, and return structured, point-wise output. The benefit is speed and low operational overhead: it leverages the model’s existing capabilities and focuses on exploring what the pre-trained LLM can already do with better guidance.
RAG adds a “backpack” of external knowledge. When the user asks a question, the system converts the query into vectors, performs vector similarity search against a vector database, retrieves the most relevant context (stored as vector embeddings), and then combines that context with the prompt so the LLM can generate a grounded answer. This is especially valuable for companies that need access to proprietary or frequently updated information—like internal policies or leave policies that change over time. If the knowledge isn’t in the model’s training data, RAG can still answer by retrieving the latest documents. The tradeoff is cost and complexity: the system must query the database and fetch context for each request.
Fine-tuning is different: it changes the model itself. Using the Jarvis-to-Tom analogy, the assistant is trained to respond in a way that fits a specific person’s preferences and context. Practically, developers start with a pre-trained LLM (often a transformer-based model) and then train it further on organization-specific data. That training updates the model’s weights, so the assistant’s behavior becomes aligned with the desired style and knowledge. Fine-tuning can make an AI assistant feel consistently “on-brand,” from greetings to domain-specific responses, but it comes with major challenges: high training cost (including GPU usage), expensive data preparation, and the need to redeploy and manage the resulting model.
Cost and effort also shape the decision: prompt engineering requires iterative prompt testing and stakeholder approval; RAG requires maintaining and querying external knowledge stores; fine-tuning requires expensive training and ongoing model operations. As a rule of thumb, RAG fits when the key requirement is up-to-date, domain-specific information; fine-tuning fits when the assistant must follow organization goals and a consistent interaction style across many inputs; prompt engineering fits when the goal is to extract better performance from an existing pre-trained model through clearer instructions.
Cornell Notes
Prompt engineering, RAG, and fine-tuning are three ways to improve LLM outputs, but they improve different parts of the system. Prompt engineering keeps the model fixed and boosts results by writing clearer, more structured instructions (e.g., asking for a specific kind of answer and requesting point-wise structure). RAG keeps the model fixed but adds retrieval: it converts the user query into vectors, searches a vector database for relevant embedded documents, and uses the retrieved context to generate grounded responses—ideal for proprietary or frequently updated company information. Fine-tuning updates the model’s internal weights using organization-specific data so the assistant consistently behaves in a desired way, but it is costly due to training, data preparation, and redeployment. Choosing the right approach depends on whether knowledge comes from prompts, documents, or training.
How does prompt engineering improve an LLM’s answers without changing the model?
What makes RAG different from prompt engineering in how it answers questions?
Why is fine-tuning more expensive than prompt engineering and RAG?
When should a team prefer RAG over fine-tuning?
When should a team prefer fine-tuning over RAG?
Review Questions
- If a company’s policy documents change weekly, which approach—prompt engineering, RAG, or fine-tuning—best supports accurate answers, and why?
- What are the main operational costs introduced by RAG compared with fine-tuning?
- How does changing a prompt differ from updating a model’s weights in terms of what gets improved?
Key Points
- 1
Prompt engineering improves outputs by refining the instructions given to a pre-trained LLM, often using clearer roles, constraints, and structured output requests.
- 2
RAG improves outputs by retrieving relevant external context from a vector database using vector similarity search, then generating an answer grounded in that retrieved text.
- 3
Fine-tuning improves outputs by updating a model’s internal weights using organization-specific data so the assistant consistently follows a desired behavior and style.
- 4
RAG is a strong fit for proprietary or frequently updated information (like evolving company policies) because it can pull the latest documents at query time.
- 5
Fine-tuning is a strong fit when the assistant must match organization goals and interaction patterns across many inputs, but it requires expensive training and data preparation.
- 6
Prompt engineering typically involves iterative testing of prompts and stakeholder approval, while RAG and fine-tuning introduce different operational costs (retrieval vs training/redeployment).