Get AI summaries of any video or article — Sign up free
Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3 thumbnail

Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3

Sam Witteveen·
4 min read

Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Ollama plus LangChain can run function calling and structured JSON extraction locally using Llama 3 8B and Phi-3.

Briefing

Running function calling and structured JSON outputs locally is practical with smaller open models—especially Llama 3 8B on Ollama—and it enables agent workflows that can run for long periods without cloud token costs or strict context ceilings. The core takeaway is that LangChain can drive local LLMs to produce schema-constrained data and even emit tool/function-call requests, letting downstream code execute actions like “get current weather” while the model stays on-device.

The walkthrough starts by shifting away from cloud-hosted proprietary models (previously used for function calling) toward Ollama serving Llama 3 8B locally, including a quantized setup. It frames model choice using the “Gorilla” function-calling leaderboard, where Llama 3 70B appears in the top tier and Llama 3 8B also ranks—just not as strongly. That leaderboard context motivates testing both Llama 3 8B and Microsoft’s Phi-3, a smaller model that’s still capable of structured extraction tasks.

For basic generation, the code uses LangChain with Ollama’s chat interface, a string output parser, and a prompt template. It demonstrates two modes: one-shot invocation that returns a full response at the end, and streaming output that begins emitting tokens immediately once the model is loaded. A practical detail is “keep alive,” which helps avoid reloading the model between notebook cells.

The next step targets structured outputs. Llama 3 is configured to prefer JSON output, paired with a JSON schema and a JSON output parser. The schema is injected into the prompt, and the parser converts the model’s JSON text into a native dictionary. The example shows partial correctness—required fields like name and age come back, while other fields (e.g., hobbies) may be wrong—highlighting that schema prompting improves reliability but doesn’t guarantee perfection. Switching to a plain string parser confirms why the JSON parser matters: it preserves structured types instead of returning raw text.

Tool use and function calling come next via Ollama “functions” support integrated into LangChain (described as experimental). Tools are bound to the model so it can respond with a function-call payload rather than free-form text. With Llama 3, a weather query for Singapore yields a clear function name (“get current weather”) and arguments including location and a unit (Celsius). The same setup with Phi-3 works faster but is less complete when multiple arguments are expected—Phi-3 omits the unit—suggesting smaller models may be more brittle under richer tool schemas.

Overall, the results position Llama 3 8B as the more dependable local choice for structured extraction and tool-call formatting, while Phi-3 still performs credibly for simpler structured tasks. The payoff is an agent-building path that can keep running locally—enabling longer autonomous workflows—while using LangChain to enforce JSON structure and orchestrate tool calls that downstream code can execute.

Cornell Notes

Local function calling is achievable with Ollama-served models using LangChain, with Llama 3 8B performing best among the tested options. By combining JSON schema prompting with LangChain’s JSON output parser, the model can return structured dictionaries instead of free-form text, though required fields can still be correct while other fields may drift. For tool use, LangChain’s Ollama functions integration lets the model emit a function-call object (function name plus arguments) that code can execute, demonstrated with a weather tool. Phi-3 can also produce structured outputs and tool calls, but it may omit arguments like units, indicating lower robustness than Llama 3. This matters because local execution reduces cloud token costs and avoids token-limit constraints for longer-running agent workflows.

How does the setup ensure local models can be used smoothly in a notebook or iterative workflow?

The workflow uses Ollama’s chat interface and sets a “keep alive” option so the model stays loaded in memory between runs. That prevents repeated model reloads across notebook cells, which otherwise slows development and can be painful when testing prompts and parsers.

What’s the difference between using a JSON output parser versus a string output parser for structured responses?

With a JSON output parser, the model’s JSON text is parsed into a native dictionary (structured types). With a string output parser, the same JSON-like content comes back as plain text (including extra whitespace), and the resulting type is a string rather than a dictionary—making downstream extraction and validation harder.

Why does prompting for JSON and providing a schema still require caution?

Even when the model is biased toward JSON output and given a schema, it can still make mistakes. In the example, required fields like name and age return, but a field such as hobbies may be incorrect (the model substitutes “favorite food” instead). More detailed schemas and clearer instructions improve reliability, but they don’t guarantee correctness.

How does tool calling work with Ollama functions in LangChain?

Tools are bound to the model so it can respond with a function-call payload rather than normal text. For the weather example, the model returns a function name (e.g., “get current weather”) plus arguments such as location (“Singapore”) and unit (“Celsius” for Llama 3). Downstream code can then execute the tool using those arguments.

What performance/robustness differences appear between Llama 3 8B and Phi-3 for tool calling?

Phi-3 can successfully produce a function-call response quickly, but it may omit expected arguments. In the weather run, Phi-3 provided the location (“Singapore”) but did not include the unit, whereas Llama 3 returned both location and unit (Celsius). This suggests smaller models can be less reliable when tool schemas require multiple fields.

Review Questions

  1. When using LangChain with Ollama for structured outputs, what roles do the JSON schema and the JSON output parser play, and what failure mode appears if you use a string parser instead?
  2. In the tool-calling workflow, what does the model return, and how should downstream code use that payload to execute a tool?
  3. Compare how Llama 3 8B and Phi-3 handle the same weather tool call—what specific argument difference was observed?

Key Points

  1. 1

    Ollama plus LangChain can run function calling and structured JSON extraction locally using Llama 3 8B and Phi-3.

  2. 2

    “Keep alive” helps keep models loaded across notebook cells, reducing reload overhead during prompt iteration.

  3. 3

    Using a JSON output parser converts model output into a dictionary, enabling reliable downstream handling versus returning raw strings.

  4. 4

    Schema-based prompting improves structured output reliability but does not guarantee every field will be correct.

  5. 5

    LangChain’s Ollama functions integration can produce explicit function-call payloads (function name plus arguments) for tool execution.

  6. 6

    Llama 3 8B is more complete for tool-call arguments (e.g., includes unit), while Phi-3 may omit fields under the same tool schema.

Highlights

Local function calling works with Ollama: the model can emit a structured function-call request that downstream code can execute.
JSON schema + LangChain’s JSON output parser turns model responses into typed dictionaries, not just text.
Phi-3 can handle structured outputs and tool calls, but it may drop expected arguments like units compared with Llama 3 8B.

Topics

Mentioned