AI AGENTS Updates From Google, OpenAI and Anthropic

TL;DR

Google defines an AI agent as a goal-driven system that observes and acts using available tools, not just a text generator.

Briefing Cornell Notes

Briefing

AI agents are increasingly defined less by raw language ability and more by their ability to pursue goals through a loop of tool use—an approach Google lays out in a detailed 42-page framework and that OpenAI and Anthropic are now turning into working developer patterns.

Google’s paper frames a generative AI agent as an application that tries to achieve a goal by observing the world and acting on it using tools available to it. That “act” part is what distinguishes agents from standalone models: models are limited to what they learned during training, while agents extend knowledge and capability by connecting to external systems. Google argues that tools bridge the gap between impressive text generation and real-world interaction, enabling agents to pull in external data and trigger actions beyond the model’s native context.

The paper breaks the agent stack into an orchestration layer and a set of tool types. The orchestration layer governs the iterative process: take in information, perform internal reasoning, decide on the next action, and repeat until a goal is reached or a stopping condition triggers. Stopping can be handled by automated checks—such as an LLM judging whether an answer is good enough—or by escalation to a human for higher-stakes review. Google also separates “agents vs. models” explicitly: models rely on training-data knowledge, while agents gain extended knowledge through tool connections.

On the tool side, Google highlights three categories: extensions, function calling, and data stores. Extensions map to selecting the right external capability—analogous to choosing an API endpoint like a flights service or a maps service based on the user’s request. Function calling lets an agent choose among predefined, reusable code modules and supply the correct arguments according to a schema. Data stores provide runtime access to structured or unstructured information—often via vector databases for retrieval-augmented generation—so agents can query relevant documents instead of stuffing large corpora into the context window.

A key practical takeaway is that agent quality scales with model quality. Google’s framework implies that a strong orchestration layer and well-defined tools can’t compensate for a weak model that can’t reason, follow instructions, or select the right tools. The paper also points toward more complex “agent chaining” and multi-agent setups, where specialized agents can hand off tasks to one another and collectively solve harder problems.

The transcript then shifts from theory to implementation. OpenAI shares a reference implementation for orchestrating agentic patterns using its Realtime API, including sequential agent handoffs defined as an agent graph and escalation logic for high-stakes decisions. A live demo shows a voice-based authentication flow that spells out personal details letter by letter for confirmation.

Finally, Anthropic’s tool-use learning materials and a hands-on example demonstrate how to add tools to Claude via function calling and structured outputs. The walkthrough builds a simple “execute python file” tool by defining a JSON schema, wiring it into a Claude client, and having the model decide when to call the tool—producing deterministic results like returning the output of a test script. Together, the updates converge on the same theme: agents become useful when they can reliably choose tools, run them with correct inputs, and iterate toward a goal.

Cornell Notes

Google’s agent framework defines an AI agent as a goal-driven system that observes and acts using available tools, not just a model that generates text. The core mechanism is an orchestration layer that loops through reasoning and tool-based actions until a goal or stopping condition is reached, sometimes with LLM-based self-checks or human escalation. Google groups tools into extensions (choose the right API capability), function calling (invoke predefined functions with schema-defined arguments), and data stores (often vector databases for retrieval). The transcript emphasizes that agent performance scales with model quality: better reasoning and instruction-following improves tool selection and outcomes. OpenAI and Anthropic updates show how these ideas translate into working Realtime voice agent patterns and Claude tool-use with structured outputs.

How does Google define an “agent,” and what makes it different from a plain language model?

Google defines a generative AI agent as an application that attempts to achieve a goal by observing the world and acting upon it using tools available to it. A plain model’s knowledge is limited to training data, but an agent extends capability by connecting to external systems through tools—so it can retrieve real-time information and trigger actions beyond what the model can do natively.

What role does the orchestration layer play in agent behavior?

The orchestration layer controls the iterative loop: it ingests information, performs internal reasoning, decides on the next action, and continues until the agent reaches a goal or hits a stopping point. Stopping can be automated (e.g., an LLM judges whether an answer is good enough, or another LLM reviews and requests another attempt) or escalated to a human for confirmation in higher-stakes cases.

What are Google’s three tool categories—extensions, function calling, and data stores—and how do they differ?

Extensions are like selecting the right external capability/API endpoint for the user’s request (e.g., flights vs. maps). Function calling lets the model invoke predefined, reusable code modules and supply arguments according to a schema. Data stores provide runtime access to additional data—often via vector databases storing vector embeddings—so the agent can retrieve relevant documents without loading massive corpora into the context window.

Why does Google argue that agent usefulness depends on model quality, not just tooling?

Even with a strong orchestration layer and well-defined tools, an agent can’t perform well if the underlying model can’t reason, follow instructions, and choose the right tools. In the transcript’s framing, agent capability scales with the model’s ability to select tools and generate correct tool inputs.

How does OpenAI’s Realtime API reference implementation demonstrate agentic patterns in practice?

OpenAI’s reference implementation uses the Realtime API to prototype voice apps with multi-agent flows. The demo highlights sequential agent handoff using an agent graph (e.g., routing a request to a “travel agent”), background escalation to more capable models for high-stakes decisions (mentioned as “01 mini”), and state-machine-like prompting to collect and confirm information character by character during authentication.

What does the Anthropic tool-use example add, beyond “just calling a model”?

The example shows adding a tool via function calling with a JSON schema. It defines an “execute python file” tool that uses subprocess to run a local Python script, then wires it into a Claude client with a tool definition and a prompt that instructs the model to call the tool only when necessary. The workflow includes testing with a test.py file and observing deterministic outputs like returning 7 for a simple arithmetic script.

Review Questions

What components must exist for an AI system to behave like an agent under Google’s framework (and what does each component do)?
How do extensions, function calling, and data stores each contribute to an agent’s ability to act in the real world?
In the Claude tool-use walkthrough, what information does the model need to call the tool correctly, and how is that represented?

Key Points

1
Google defines an AI agent as a goal-driven system that observes and acts using available tools, not just a text generator.
2
An orchestration layer runs an iterative loop of reasoning and action selection until a goal is reached or a stopping condition triggers.
3
Google’s tool categories—extensions, function calling, and data stores—map to choosing external capabilities, invoking schema-defined functions, and retrieving runtime knowledge.
4
Agent performance depends on the underlying model’s reasoning and instruction-following; strong tooling can’t fix a weak model.
5
OpenAI’s Realtime API reference implementation demonstrates multi-agent voice flows with sequential handoffs and escalation for high-stakes decisions.
6
Anthropic’s tool-use workflow shows how structured outputs and JSON schemas can make tool calling more deterministic and reliable.

Highlights

Google’s core agent definition centers on acting toward a goal using tools—knowledge extension and real-world interaction come from tool connections, not training alone.

The orchestration layer is the engine: it repeatedly ingests context, reasons, chooses an action, and stops via automated checks or human escalation.

Google’s three tool types—extensions, function calling, and data stores—offer a practical blueprint for building agent capabilities.

OpenAI’s Realtime demo illustrates agentic handoffs and confirmation-heavy authentication flows in a voice setting.

The Claude example demonstrates a complete tool pipeline: define a JSON schema, wire the tool into the client, and let the model decide when to call it.

Topics

Mentioned

LLM
RAG
API
JSON
PDF
LLMs