Llama3 + CrewAI + Groq = Email AI Agent
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Create a Groq API key and select the Llama 3 70B model (8,000-token context window) to power the agent quickly.
Briefing
A practical recipe for turning Llama 3 into an email-reply agent with CrewAI is built around Groq’s fast inference—using the Llama 3 70B model with an 8,000-token context window. The core workflow takes incoming customer emails, classifies each message (pricing inquiry, complaint, product inquiry, feedback, or off topic), optionally performs targeted research, and then drafts a polite, on-brand response. The payoff is speed and a clear multi-step structure: category → research → email writing, all orchestrated by CrewAI.
Setup starts with Groq Cloud/Console, where an API key is created and the Llama 3 70B model is selected. The transcript emphasizes that Groq’s access is currently free for trying the 70B model. On the coding side, the environment installs CrewAI and LangChain-groq, then wires Groq as the LLM backend via LangChain’s Groq chat model wrapper. The agent logic is organized into two main agent roles plus a final drafting step: an email categorizer agent and a research agent, followed by an email writer task that uses the category and research output.
The categorizer agent is prompted with a fixed set of categories and a backstory aimed at understanding what customers want. That category becomes a control signal for downstream behavior—helping the system decide how to respond and how to structure the reply. The research agent then uses the category and the email content to decide whether web search is needed. If search isn’t helpful, it returns “no search needed”; if nothing useful is found, it returns “no useful research found.” In a more production-ready design, the transcript suggests replacing web search with an internal RAG system (retrieving from internal knowledge bases or FAQs), with web search as a fallback when internal sources fail.
For email drafting, the writer task combines the original email, the category, and the research results to produce a response that is simple, polite, and to the point. The prompt also includes a consistent sign-off persona—“Sarah, the resident manager”—and the transcript notes a subtle failure mode: off-topic messages can still trigger mismatched answers if the prompt context isn’t specific enough about the business domain.
Three test emails demonstrate the pipeline. A positive note (“wonderful stay…”) is categorized as customer feedback; the research step returns guidance on how to respond to gratitude, and the final draft thanks the sender and mirrors the appreciative tone. A complaint about Arizona weather in April is categorized as a customer complaint; the research step pulls temperature/weather information and the drafted reply acknowledges the inconvenience, apologizes, and offers reassurance. An off-topic question (“why can’t I get to sing?” from Ringo) ends up categorized as off topic; the research step yields no useful results, and the draft asks for clarification rather than forcing a web-based answer.
Overall, the transcript frames CrewAI as somewhat finicky to get working reliably, but pairing it with a strong model like Llama 3 70B on Groq produces fast, coherent multi-step outputs. Future improvements mentioned include using LangGraph for more control and adding extra checks, plus an alternative run using Ollama (potentially with an 8B model) for local experimentation.
Cornell Notes
The workflow builds an email AI agent by chaining three steps: categorize the incoming email, optionally research based on that category, then draft a reply. CrewAI orchestrates the process, while Groq hosts the Llama 3 70B model for fast responses within an 8,000-token context window. The categorizer assigns one of several labels (pricing inquiry, customer complaint, product inquiry, customer feedback, off topic), and that label steers both research and writing. The research stage can use web search, but the transcript recommends replacing it with an internal RAG system for production, using web search only as a fallback. Tests show the pipeline works for feedback, complaints (with weather research), and off-topic questions (requesting clarification when research is unavailable).
How does the system decide what kind of email it received, and why does that matter for the reply?
What role does the research agent play, and what happens when it can’t find useful information?
Why does the transcript recommend internal RAG instead of always using web search?
How is the email tone and identity kept consistent across different categories?
What were the three example emails, and how did the pipeline respond differently to each?
What practical issues does the transcript flag when building with CrewAI?
Review Questions
- What information is passed from the categorization step into the research step, and how does that shape the final email draft?
- In the off-topic scenario, what does the system do when research returns “no useful research found,” and why is that behavior important?
- What changes would you make to move from web search to a production-grade internal RAG system with a web fallback?
Key Points
- 1
Create a Groq API key and select the Llama 3 70B model (8,000-token context window) to power the agent quickly.
- 2
Install CrewAI and LangChain-groq, then use LangChain’s Groq chat model wrapper to connect CrewAI to Llama 3 70B.
- 3
Use a categorizer agent with fixed labels (pricing inquiry, customer complaint, product inquiry, customer feedback, off topic) to steer downstream behavior.
- 4
Run a research agent that decides whether search is needed and returns either useful findings or “no useful research found.”
- 5
Draft replies by combining the original email, the category, and the research output, while enforcing a consistent tone and sign-off (“Sarah, the resident manager”).
- 6
For production, replace web search with an internal RAG system (FAQ/knowledge base retrieval) and keep web search only as a fallback.
- 7
Expect some finickiness in CrewAI tool behavior and plan for more control (e.g., via LangGraph and extra checks).