Get AI summaries of any video or article — Sign up free
Build Anything with Llama 3 Agents, Here’s How thumbnail

Build Anything with Llama 3 Agents, Here’s How

David Ondrej·
4 min read

Based on David Ondrej's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Use Ollama to download and run Llama 3 locally first, then build CrewAI agents on top of it.

Briefing

Local Llama 3 agents can be built on a modest machine, but getting reliable, fast behavior inside CrewAI may require routing the model through Gro’s API. The workflow starts with running Llama locally via Ollama (including downloading the 8B or 70B model), then wiring two CrewAI agents—one to classify an email as “important,” “casual,” or “spam,” and a second to draft a concise reply based on that classification. The practical payoff is an end-to-end “agent team” that can process an input email and produce both a label and a response.

The build begins with setup: download Ollama, install VS Code, and pull the Llama 3 model from Ollama’s model list. The first run triggers a download (about 4.7 GB for the 8B model, roughly 40 GB for the 70B), which can take around 20 minutes or several hours depending on hardware. Once Ollama is running, the transcript shifts to Python. It creates a simple main.py, installs needed packages with pip (including CrewAI), and imports Ollama through LangChain Community along with CrewAI’s Agent, Task, Crew, and Process.

Two agents are defined. The “email classifier” agent is given a role, a goal to label each email as important/casual/spam, and a short backstory; it uses the local Llama 3 model and enables verbose logging while disabling delegation. The “responder” agent follows the same pattern but focuses on writing a concise, simple response tailored to the email’s importance category. Next comes task definition: one task uses an f-string to insert the email text and expects one of the three labels, while the second task instructs the system to respond to the email. Those tasks are assembled into a Crew with sequential processing, then kicked off and printed.

A key snag appears during testing: the Llama 3 model behaves correctly when run directly in the terminal, yet CrewAI integration produces problematic behavior (including suspicion of hallucinations or slower, inconsistent outputs). The classifier still sometimes returns the expected “spam” label for a Nigerian prince-style phishing example, but the agent-run path is unreliable enough to warrant a fix.

The solution is to connect CrewAI to Gro’s API. The transcript walks through creating a Gro API key, setting environment variables (API key, API base URL, and model name), and importing OS to read those values. After removing the explicit local LLM assignment—letting CrewAI use its default configuration—the same agent workflow runs successfully through Gro, with dramatically improved speed. The result is a practical “agent team” that preserves the two-step classification-and-response logic while trading local inference for a faster, more dependable API-backed model call.

Cornell Notes

The transcript shows how to build a two-agent system using Llama 3 with CrewAI: one agent classifies an email as important, casual, or spam, and a second agent writes a concise reply based on that classification. Setup starts with Ollama to download and run Llama 3 locally (8B or 70B), then moves to Python where CrewAI agents and tasks are defined and executed sequentially. A reliability issue appears when running Llama 3 through CrewAI locally—terminal tests look fine, but agent execution becomes inconsistent. Switching CrewAI to Gro’s API by setting API key and base URL environment variables makes the workflow work correctly and run much faster.

How does the build turn a single Llama 3 model into a two-step “agent team” workflow?

It creates two CrewAI agents and two corresponding tasks. The first agent (“email classifier”) receives an email string and is instructed to output exactly one label: important, casual, or spam. The second agent (“responder”) takes the email and writes a concise, simple response based on the email’s importance. A Crew object ties them together with sequential processing so the classifier runs first and the responder follows.

What are the exact categories the classifier is trained to output, and how is the email inserted into the prompt?

The classifier’s expected output is one of three options: important, casual, or spam. The task description uses a Python f-string to embed the email variable directly into the prompt text, so each run can classify a different input email.

Why does the transcript suggest local Llama 3 can be unreliable inside CrewAI even if terminal tests look correct?

The author reports that Llama 3 works perfectly when chatting directly in the terminal, but behaves poorly when invoked through CrewAI—suggesting issues like hallucinations or inconsistent performance. The classifier sometimes still returns the correct label for a Nigerian prince-style phishing email, but the agent-run path is not stable enough to trust without changes.

What change fixes the reliability and speed problem, according to the transcript?

Routing the model calls through Gro’s API. The fix involves creating a Gro API key, setting environment variables for API key, API base URL, and model name, and then letting CrewAI use its default model configuration rather than explicitly assigning the local Ollama LLM object.

What hardware and download constraints are highlighted for running Llama 3 locally with Ollama?

Running locally requires downloading the model the first time. The 8B model is about 4.7 GB and can take around 20 minutes; the 70B model is about 40 GB and may take roughly 3 hours on a good PC. Memory usage is also discussed: the transcript notes RAM rising into the few-GB range (roughly 3–6 GB) for the 8B model during execution.

Review Questions

  1. What sequential dependencies exist between the classifier task and the responder task in the Crew configuration?
  2. What specific environment variables are needed to connect CrewAI to Gro’s API, and what is the purpose of each?
  3. How does the transcript’s troubleshooting logic distinguish between a model problem and an integration problem?

Key Points

  1. 1

    Use Ollama to download and run Llama 3 locally first, then build CrewAI agents on top of it.

  2. 2

    Create two agents: one that outputs a strict label (important/casual/spam) and one that drafts a response based on that label.

  3. 3

    Define tasks with f-strings so the email content is injected into the prompt dynamically.

  4. 4

    Expect local CrewAI integration to be less reliable than terminal chat; test outputs and watch for inconsistent behavior.

  5. 5

    If local agent execution is unstable or slow, switch to Gro’s API by setting API key, API base URL, and model name via environment variables.

  6. 6

    Let CrewAI use its default model configuration when using Gro, rather than explicitly wiring the local Ollama LLM object.

  7. 7

    Plan for one-time model downloads: ~4.7 GB for Llama 3 8B and ~40 GB for Llama 3 70B.

Highlights

A two-agent CrewAI setup can classify an email into important/casual/spam and then generate a concise reply based on that classification.
Local Llama 3 works well in terminal chat, yet can behave inconsistently when called through CrewAI—prompting a switch to an API-backed route.
Connecting CrewAI to Gro via API key and base URL environment variables makes the same agent workflow run correctly and much faster.
The transcript emphasizes practical constraints: first-time Ollama downloads (20 minutes for 8B, hours for 70B) and RAM usage in the few-GB range for 8B.

Topics

Mentioned