Build Anything with Llama 3 Agents, Here’s How
Based on David Ondrej's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use Ollama to download and run Llama 3 locally first, then build CrewAI agents on top of it.
Briefing
Local Llama 3 agents can be built on a modest machine, but getting reliable, fast behavior inside CrewAI may require routing the model through Gro’s API. The workflow starts with running Llama locally via Ollama (including downloading the 8B or 70B model), then wiring two CrewAI agents—one to classify an email as “important,” “casual,” or “spam,” and a second to draft a concise reply based on that classification. The practical payoff is an end-to-end “agent team” that can process an input email and produce both a label and a response.
The build begins with setup: download Ollama, install VS Code, and pull the Llama 3 model from Ollama’s model list. The first run triggers a download (about 4.7 GB for the 8B model, roughly 40 GB for the 70B), which can take around 20 minutes or several hours depending on hardware. Once Ollama is running, the transcript shifts to Python. It creates a simple main.py, installs needed packages with pip (including CrewAI), and imports Ollama through LangChain Community along with CrewAI’s Agent, Task, Crew, and Process.
Two agents are defined. The “email classifier” agent is given a role, a goal to label each email as important/casual/spam, and a short backstory; it uses the local Llama 3 model and enables verbose logging while disabling delegation. The “responder” agent follows the same pattern but focuses on writing a concise, simple response tailored to the email’s importance category. Next comes task definition: one task uses an f-string to insert the email text and expects one of the three labels, while the second task instructs the system to respond to the email. Those tasks are assembled into a Crew with sequential processing, then kicked off and printed.
A key snag appears during testing: the Llama 3 model behaves correctly when run directly in the terminal, yet CrewAI integration produces problematic behavior (including suspicion of hallucinations or slower, inconsistent outputs). The classifier still sometimes returns the expected “spam” label for a Nigerian prince-style phishing example, but the agent-run path is unreliable enough to warrant a fix.
The solution is to connect CrewAI to Gro’s API. The transcript walks through creating a Gro API key, setting environment variables (API key, API base URL, and model name), and importing OS to read those values. After removing the explicit local LLM assignment—letting CrewAI use its default configuration—the same agent workflow runs successfully through Gro, with dramatically improved speed. The result is a practical “agent team” that preserves the two-step classification-and-response logic while trading local inference for a faster, more dependable API-backed model call.
Cornell Notes
The transcript shows how to build a two-agent system using Llama 3 with CrewAI: one agent classifies an email as important, casual, or spam, and a second agent writes a concise reply based on that classification. Setup starts with Ollama to download and run Llama 3 locally (8B or 70B), then moves to Python where CrewAI agents and tasks are defined and executed sequentially. A reliability issue appears when running Llama 3 through CrewAI locally—terminal tests look fine, but agent execution becomes inconsistent. Switching CrewAI to Gro’s API by setting API key and base URL environment variables makes the workflow work correctly and run much faster.
How does the build turn a single Llama 3 model into a two-step “agent team” workflow?
What are the exact categories the classifier is trained to output, and how is the email inserted into the prompt?
Why does the transcript suggest local Llama 3 can be unreliable inside CrewAI even if terminal tests look correct?
What change fixes the reliability and speed problem, according to the transcript?
What hardware and download constraints are highlighted for running Llama 3 locally with Ollama?
Review Questions
- What sequential dependencies exist between the classifier task and the responder task in the Crew configuration?
- What specific environment variables are needed to connect CrewAI to Gro’s API, and what is the purpose of each?
- How does the transcript’s troubleshooting logic distinguish between a model problem and an integration problem?
Key Points
- 1
Use Ollama to download and run Llama 3 locally first, then build CrewAI agents on top of it.
- 2
Create two agents: one that outputs a strict label (important/casual/spam) and one that drafts a response based on that label.
- 3
Define tasks with f-strings so the email content is injected into the prompt dynamically.
- 4
Expect local CrewAI integration to be less reliable than terminal chat; test outputs and watch for inconsistent behavior.
- 5
If local agent execution is unstable or slow, switch to Gro’s API by setting API key, API base URL, and model name via environment variables.
- 6
Let CrewAI use its default model configuration when using Gro, rather than explicitly wiring the local Ollama LLM object.
- 7
Plan for one-time model downloads: ~4.7 GB for Llama 3 8B and ~40 GB for Llama 3 70B.