Llama 3 8B: BIG Step for Local AI Agents! - Full Tutorial (Build Your Own Tools)
Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Give the local agent a small set of explicit tools (search Google, check context/RAG, send email) and let the model choose among them via structured, parseable outputs.
Briefing
A local Llama 3 8B agent can be made genuinely useful by giving it a small set of “tools” (Google search, RAG-based retrieval, and email sending) and wiring those tools to the model through lightweight, custom function-calling logic—without relying on LangChain. In the demo, the agent searches the web via SerpAPI, scrapes results from AI Meta and The Verge, embeds the scraped text into a local RAG vault, then answers questions by querying that vault. When asked how many tokens Llama 3 was trained on, it retrieves context from the stored pages and returns a figure of up to 15 trillion tokens.
The practical payoff is that the agent doesn’t just generate text; it triggers actions. After retrieving the training-token claim, it uses a dedicated “send email” function to email the information to the user, with the transcript showing “email sent successfully” and the received message containing the retrieved claim. The creator emphasizes that this works on an 8B model running locally on “AMA” (the transcript’s local environment), and that instruction-following is strong enough to drive tool use reliably—something the tutorial contrasts with earlier attempts using smaller local models.
Under the hood, the system is built around three core functions: a search Google function (using SerpAPI to return URLs), a scrape-and-store step that adds page text into a RAG vault, and a check context function that queries the RAG system. A fourth tool, send email, is added to demonstrate outward actions beyond information retrieval. The “intelligent” part happens in the chat loop: user requests are interpreted so the model can decide when to call a tool, then return a structured response that includes a special wrapper instruction.
That wrapper is parsed by a “parse function call” routine acting like a detector. It scans the model’s output for specific tags (the transcript calls them wrapper tags and a secret instruction note), extracts a JSON-like instruction payload, converts it into a simple Python dictionary, and then executes the requested function with the provided arguments. The tutorial stresses that the model must fill in argument values—like the Google query—by replacing placeholders with the user’s intent.
The tutorial also demonstrates end-to-end workflows: searching for available AMA models via Google, checking whether AMA has the Llama 3 Model using RAG (“ama pool Llama 3 command to import it”), and then sending the resulting guidance by email. Finally, it shows how to extend the agent with a new custom tool: a “Write to notes” function that appends user-provided content to notes.txt. The system message is updated to instruct the model to emit the correct function-call wrapper when users ask to write notes, the function schema is added to the OpenAI-style function list, and a small conditional block in the chat logic triggers the new tool.
Overall, the central insight is that a local agent becomes practical when tool calls are deterministic and parseable: the model decides what to do, but the surrounding code enforces how tools are invoked, how retrieved context is stored, and how actions like emailing and file writes are executed.
Cornell Notes
Llama 3 8B can run as a local agent that performs real tasks by combining tool functions with a simple, custom function-calling protocol. The workflow starts with a Google search tool (via SerpAPI), scrapes top URLs, embeds the text into a local RAG vault, and answers questions by querying that vault. A separate “send email” tool turns retrieved answers into an action, and the transcript shows successful email delivery. Tool calls are triggered through structured wrapper tags in the model’s output; a parse routine extracts a JSON-like instruction, converts it into a Python dictionary, and executes the requested function with model-filled arguments. The tutorial then extends the system with a “Write to notes” tool that appends content to notes.txt.
How does the agent turn a natural-language request into a tool action like web search?
What role does RAG play in answering factual questions in the demo?
How are external actions handled beyond information retrieval?
What makes the function-calling approach work without LangChain?
How can the agent be extended with a new tool like writing to a file?
Review Questions
- What specific wrapper-tag mechanism does the parse function call routine use to decide which tool to execute?
- How does the agent ensure answers come from scraped web content rather than only from the model’s prior knowledge?
- What changes are required in the system message, function schema, and chat logic to add a new tool like Write to notes?
Key Points
- 1
Give the local agent a small set of explicit tools (search Google, check context/RAG, send email) and let the model choose among them via structured, parseable outputs.
- 2
Use SerpAPI to fetch Google results, scrape the top URLs, and embed the scraped text into a local RAG vault for grounded retrieval.
- 3
Implement a parse function call routine that detects wrapper tags in the model output, extracts a JSON-like instruction, converts it into a Python dictionary, and executes the requested function.
- 4
Design tool arguments so the model must fill in user-derived values (e.g., the Google query string) while the code enforces the function name and parameter structure.
- 5
Maintain a conversation history in the chat loop so follow-up requests can reuse prior context and tool outputs.
- 6
Extend functionality by updating the system message intent rules, adding the new function schema (parameters/descriptions), and adding a conditional branch in the chat logic to run the new tool.