Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Ollama plus LangChain can run function calling and structured JSON extraction locally using Llama 3 8B and Phi-3.
Briefing
Running function calling and structured JSON outputs locally is practical with smaller open models—especially Llama 3 8B on Ollama—and it enables agent workflows that can run for long periods without cloud token costs or strict context ceilings. The core takeaway is that LangChain can drive local LLMs to produce schema-constrained data and even emit tool/function-call requests, letting downstream code execute actions like “get current weather” while the model stays on-device.
The walkthrough starts by shifting away from cloud-hosted proprietary models (previously used for function calling) toward Ollama serving Llama 3 8B locally, including a quantized setup. It frames model choice using the “Gorilla” function-calling leaderboard, where Llama 3 70B appears in the top tier and Llama 3 8B also ranks—just not as strongly. That leaderboard context motivates testing both Llama 3 8B and Microsoft’s Phi-3, a smaller model that’s still capable of structured extraction tasks.
For basic generation, the code uses LangChain with Ollama’s chat interface, a string output parser, and a prompt template. It demonstrates two modes: one-shot invocation that returns a full response at the end, and streaming output that begins emitting tokens immediately once the model is loaded. A practical detail is “keep alive,” which helps avoid reloading the model between notebook cells.
The next step targets structured outputs. Llama 3 is configured to prefer JSON output, paired with a JSON schema and a JSON output parser. The schema is injected into the prompt, and the parser converts the model’s JSON text into a native dictionary. The example shows partial correctness—required fields like name and age come back, while other fields (e.g., hobbies) may be wrong—highlighting that schema prompting improves reliability but doesn’t guarantee perfection. Switching to a plain string parser confirms why the JSON parser matters: it preserves structured types instead of returning raw text.
Tool use and function calling come next via Ollama “functions” support integrated into LangChain (described as experimental). Tools are bound to the model so it can respond with a function-call payload rather than free-form text. With Llama 3, a weather query for Singapore yields a clear function name (“get current weather”) and arguments including location and a unit (Celsius). The same setup with Phi-3 works faster but is less complete when multiple arguments are expected—Phi-3 omits the unit—suggesting smaller models may be more brittle under richer tool schemas.
Overall, the results position Llama 3 8B as the more dependable local choice for structured extraction and tool-call formatting, while Phi-3 still performs credibly for simpler structured tasks. The payoff is an agent-building path that can keep running locally—enabling longer autonomous workflows—while using LangChain to enforce JSON structure and orchestrate tool calls that downstream code can execute.
Cornell Notes
Local function calling is achievable with Ollama-served models using LangChain, with Llama 3 8B performing best among the tested options. By combining JSON schema prompting with LangChain’s JSON output parser, the model can return structured dictionaries instead of free-form text, though required fields can still be correct while other fields may drift. For tool use, LangChain’s Ollama functions integration lets the model emit a function-call object (function name plus arguments) that code can execute, demonstrated with a weather tool. Phi-3 can also produce structured outputs and tool calls, but it may omit arguments like units, indicating lower robustness than Llama 3. This matters because local execution reduces cloud token costs and avoids token-limit constraints for longer-running agent workflows.
How does the setup ensure local models can be used smoothly in a notebook or iterative workflow?
What’s the difference between using a JSON output parser versus a string output parser for structured responses?
Why does prompting for JSON and providing a schema still require caution?
How does tool calling work with Ollama functions in LangChain?
What performance/robustness differences appear between Llama 3 8B and Phi-3 for tool calling?
Review Questions
- When using LangChain with Ollama for structured outputs, what roles do the JSON schema and the JSON output parser play, and what failure mode appears if you use a string parser instead?
- In the tool-calling workflow, what does the model return, and how should downstream code use that payload to execute a tool?
- Compare how Llama 3 8B and Phi-3 handle the same weather tool call—what specific argument difference was observed?
Key Points
- 1
Ollama plus LangChain can run function calling and structured JSON extraction locally using Llama 3 8B and Phi-3.
- 2
“Keep alive” helps keep models loaded across notebook cells, reducing reload overhead during prompt iteration.
- 3
Using a JSON output parser converts model output into a dictionary, enabling reliable downstream handling versus returning raw strings.
- 4
Schema-based prompting improves structured output reliability but does not guarantee every field will be correct.
- 5
LangChain’s Ollama functions integration can produce explicit function-call payloads (function name plus arguments) for tool execution.
- 6
Llama 3 8B is more complete for tool-call arguments (e.g., includes unit), while Phi-3 may omit fields under the same tool schema.