FunctionGemma - Function Calling at the Edge
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Function Gemma is a specialized, fine-tunable Gemma 270M model designed to perform structured function calling on edge devices, including phones.
Briefing
Function Gemma brings customizable function calling to a compact Gemma model designed for edge deployment—so apps and games can run locally on phones (and devices like Jetson Nano) while still letting the model trigger real actions. The core shift is moving beyond “chat-only” behavior: instead of hard-coding tool logic on the client, developers can fine-tune a small model to reliably emit structured function calls for the specific tools their application needs.
At the center is a specialized model built on Gemma 270M (270 million parameters), a base model trained on 6 trillion tokens and positioned as strong for its size in edge/mobile settings. Function Gemma keeps that small-model footprint but adds training specifically for function calling, including the special tokens and message structure required to represent tool definitions, function call starts, and tool responses. That training matters because function calling doesn’t work well with generic prompting alone; the model must be tuned to produce the correct call format.
The practical workflow described is: define a tool schema (the “function” the app can execute), provide a user prompt, and have the model output a function call with arguments. The app then runs the tool locally, feeds the tool’s output back into the model as a tool-role message, and the model generates the final response. This mirrors server-side function-calling patterns, but Function Gemma is optimized to make the same idea feasible on-device.
Customization is presented as the main advantage. Out of the box, Function Gemma can struggle with domain-specific tasks—for example, it may refuse or fail to schedule meetings when it hasn’t been fine-tuned for that action. Google’s released notebooks demonstrate fine-tuning using a small actions dataset (under 10k rows), where training quickly reduces validation loss and can approach overfitting on small data. The decisive test is whether the fine-tuned model correctly identifies the intended tool (e.g., “create calendar event”) and fills in structured arguments like date and title, then stops at the function call boundary.
For deployment, the transcript emphasizes edge readiness: the model is available on Hugging Face as a gated download, works with Hugging Face Transformers out of the box, and can be converted to LiteRT (the mobile/edge runtime successor to TensorFlow Lite) for running inside apps. A mobile app demo and examples using transformers.js are mentioned as ways to try function calling fully local in a browser or on a phone.
Overall, Function Gemma is framed as a concrete path to “tool-using” LLM behavior on constrained hardware: start with a small Gemma model, fine-tune it for your app’s exact actions, and export it to LiteRT so the function-calling loop can run locally without a server round-trip. While Gemma 4 isn’t released yet, the release is positioned as a meaningful step toward practical on-device agents.
Cornell Notes
Function Gemma adapts the small Gemma 270M model for structured function calling on edge devices, including phones. It relies on special tokens and a tool-call message flow: the model outputs a function call with arguments, the app executes the tool locally, then the tool output is sent back for a final response. Customization is central—out-of-the-box performance can be weak for specific actions (like scheduling meetings) until fine-tuned on an actions dataset. The release includes Hugging Face access (gated), Transformers-based inference notebooks, a fine-tuning notebook using Hugging Face TRL, and a conversion path to LiteRT for mobile deployment.
What makes Function Gemma different from generic small LLM prompting for “tools”?
How does the function-calling loop work on-device, step by step?
Why does fine-tuning matter, and what happens without it?
What does the fine-tuning setup look like in the provided workflow?
How is Function Gemma prepared for mobile/edge deployment after fine-tuning?
Where can developers get and run Function Gemma weights?
Review Questions
- What specific training elements (tokens and message structure) enable Function Gemma to produce valid function calls?
- Describe the sequence of messages exchanged between the model and the app during a tool call.
- Why might Function Gemma fail on a task like scheduling meetings before fine-tuning, and how does fine-tuning change the output?
Key Points
- 1
Function Gemma is a specialized, fine-tunable Gemma 270M model designed to perform structured function calling on edge devices, including phones.
- 2
Reliable tool use depends on model training for function-calling special tokens and the tool-call message format, not just prompting.
- 3
A complete on-device loop is: model emits a function call → app runs the tool locally → tool output is sent back → model generates the final response.
- 4
Out-of-the-box Function Gemma can underperform on domain-specific actions; fine-tuning on an actions dataset improves accuracy for those tools.
- 5
The release provides Hugging Face Transformers notebooks for inference, TRL-based notebooks for fine-tuning, and a conversion path to LiteRT for mobile deployment.
- 6
Function Gemma weights are available on Hugging Face as a gated download, requiring access approval before use.