Google Whitepaper: The Complete Guide to AI Agents

Google released a comprehensive whitepaper on AI Agents, authored by Alan Blount, Antonio Gulli, Shubham Saboo, Michael Zimmermann, and Vladimir Vuskovic (November 2025). This 54-page document provides a formal guide for developers, architects, and product leaders transitioning from proofs-of-concept to robust, production-grade agentic systems.

1. From Predictive AI to Autonomous Agents

Artificial intelligence is undergoing a paradigm shift. For years, the focus has been on models that excel at passive, discrete tasks: answering questions, translating text, or generating images from prompts. This paradigm, while powerful, requires constant human direction for every step.

We’re now seeing a fundamental shift: from AI that just predicts or creates content to a new class of software capable of autonomous problem-solving and task execution.

Agents are the natural evolution of Language Models, made useful in software.

1.1 What is an AI Agent?

An AI agent is not simply an AI model in a static workflow; it’s a complete application, making plans and taking actions to achieve goals. It combines a Language Model’s (LM) ability to reason with the practical ability to act, allowing it to handle complex, multi-step tasks that a model alone cannot.

The critical capability: Agents can work on their own, figuring out the next steps needed to reach a goal without a person guiding them at every turn.

2. The Four Core Components of an AI Agent

In the simplest terms, an AI Agent can be defined as the combination of models, tools, an orchestration layer, and runtime services which uses the LM in a loop to accomplish a goal.

2.1 The Model (“The Brain”)

The core language model or foundation model that serves as the agent’s central reasoning engine to process information, evaluate options, and make decisions. The type of model (general-purpose, fine-tuned, or multimodal) dictates the agent’s cognitive capabilities.

An agentic system is the ultimate curator of the input context window of the LM.

2.2 Tools (“The Hands”)

These mechanisms connect the agent’s reasoning to the outside world, enabling actions beyond text generation. They include:

API extensions
Code functions
Data stores (like databases or vector stores) for accessing real-time, factual information

An agentic system allows an LM to plan which tools to use, executes the tool, and puts the tool results into the input context window of the next LM call.

2.3 The Orchestration Layer (“The Nervous System”)

The governing process that manages the agent’s operational loop. It handles:

Planning
Memory (state)
Reasoning strategy execution

This layer uses prompting frameworks and reasoning techniques (like Chain-of-Thought or ReAct) to break down complex goals into steps and decide when to think versus use a tool. This layer is also responsible for giving agents the ability to “remember.”

2.4 Deployment (“The Body and Legs”)

While building an agent on a laptop is effective for prototyping, production deployment is what makes it a reliable and accessible service. This involves:

Hosting the agent on a secure, scalable server
Integrating with essential production services for monitoring, logging, and management
Enabling access through a graphical interface for users
Enabling programmatic access by other agents via an Agent-to-Agent (A2A) API

3. The Agentic Problem-Solving Process: Five Steps

At its core, an agent operates on a continuous, cyclical process to achieve its objectives. While this loop can become highly complex, it can be broken down into five fundamental steps:

3.1 Get the Mission

The process is initiated by a specific, high-level goal. This mission is provided by a user (e.g., “Organize my team’s travel for the upcoming conference”) or an automated trigger (e.g., “A new high-priority customer ticket has arrived”).

3.2 Scan the Scene

The agent perceives its environment to gather context. This involves the orchestration layer accessing its available resources:

“What does the user’s request say?”
“What information is in my long-term memory? Did I already try to do this task? Did the user give me guidance last week?”
“What can I access from my tools, like calendars, databases, or APIs?”

3.3 Think It Through

This is the agent’s core “think” loop, driven by the reasoning model. The agent analyzes the Mission (Step 1) against the Scene (Step 2) and devises a plan. This isn’t a single thought, but often a chain of reasoning:

“To book travel, I first need to know who is on the team. I will use the get_team_roster tool. Then I will need to check their availability via the calendar_api.”

3.4 Take Action

The orchestration layer executes the first concrete step of the plan. It selects and invokes the appropriate tool—calling an API, running a code function, or querying a database. This is the agent acting on the world beyond its own internal reasoning.

3.5 Observe and Iterate

The agent observes the outcome of its action. The get_team_roster tool returns a list of five names. This new information is added to the agent’s context or “memory.” The loop then repeats, returning to Step 3:

“Now that I have the roster, my next step is to check the calendar for these five people. I will use the calendar_api.”

This “Think, Act, Observe” cycle continues—managed by the Orchestration Layer, reasoned by the Model, and executed by the Tools—until the agent’s internal plan is complete and the initial Mission is achieved.

4. A Taxonomy of Agentic Systems (Five Levels)

Understanding the operational loop is the first part of the puzzle. The second is recognizing that this loop can be scaled in complexity to create different classes of agents.

Level 0: The Core Reasoning System

Before we can have an agent, we must start with the “Brain” in its most basic form: the reasoning engine itself. In this configuration, a Language Model operates in isolation, responding solely based on its vast pre-trained knowledge without any tools, memory, or interaction with the live environment.

Strength: Can explain established concepts and plan how to approach solving a problem with great depth.

Trade-off: Complete lack of real-time awareness; functionally “blind” to any event or fact outside its training data.

Example: It can explain the rules of professional baseball and the complete history of the New York Yankees. But if you ask, “What was the final score of the Yankees game last night?”, it would be unable to answer—that game is a specific, real-world event that happened after its training data was collected.

Level 1: The Connected Problem-Solver

At this level, the reasoning engine becomes a functional agent by connecting to and utilizing external tools—the “Hands” component of our architecture. Its problem-solving is no longer confined to its static, pre-trained knowledge.

Using the 5-step loop, the agent can now answer our previous question:

Mission: “What was the final score of the Yankees game last night?”
Think: Recognizes this as a real-time data need
Act: Invokes a tool, like a Google Search API with the proper date and search terms
Observe: Search result (e.g., “Yankees won 5-3”)
Synthesize: Formulates that fact into a final answer

This fundamental ability to interact with the world—whether using a search tool for a score, a financial API for a live stock price, or a database via Retrieval-Augmented Generation (RAG)—is the core capability of a Level 1 agent.

Level 2: The Strategic Problem-Solver

Level 2 marks a significant expansion in capability, moving from executing simple tasks to strategically planning complex, multi-part goals. The key skill that emerges here is context engineering: the agent’s ability to actively select, package, and manage the most relevant information for each step of its plan.

An agent’s accuracy depends on a focused, high-quality context. Context engineering curates the model’s limited attention to prevent overload and ensure efficient performance.

Example Task: “Find a good coffee shop halfway between my office at 1600 Amphitheatre Parkway, Mountain View, and my client’s office at 1 Market St, San Francisco.”

Level 2 Agent’s Plan:

Think: “I must first find the halfway point.”
- Act: Call the Maps tool with both addresses
- Observe: “The halfway point is Millbrae, CA.”
Think: “Now I must find coffee shops in Millbrae. The user asked for ‘good’ ones, so I will search for places with a 4-star rating or higher.”
- Act: Call the google_places tool with query=“coffee shop in Millbrae, CA”, min_rating=4.0
- Observe: “The search returns ‘Millbrae Coffee’ and ‘The Daily Grind’.”
Think: “I will synthesize these results and present them to the user.”

This strategic planning also enables proactive assistance, like an agent that reads a long flight confirmation email, engineers the key context (flight number, date), and acts by adding it to your calendar.

Level 3: The Collaborative Multi-Agent System

At the highest level, the paradigm shifts entirely. We move away from building a single, all-powerful “super-agent” and toward a “team of specialists” working in concert, a model that directly mirrors a human organization. The system’s collective strength lies in this division of labor.

Here, agents treat other agents as tools. Imagine a “Project Manager” agent receiving a Mission: “Launch our new ‘Solaris’ headphones.”

The Project Manager agent doesn’t do the entire work itself. It Acts by creating new Missions for its team of specialized agents, much like how it works in real life:

Delegates to MarketResearchAgent: “Analyze competitor pricing for noise-canceling headphones. Return a summary document by tomorrow.”
Delegates to MarketingAgent: “Draft three versions of a press release using the ‘Solaris’ product spec sheet as context.”
Delegates to WebDevAgent: “Generate the new product page HTML based on the attached design mockups.”

This collaborative model, while currently constrained by the reasoning limitations of today’s LMs, represents the frontier of automating entire, complex business workflows from start to finish.

Level 4: The Self-Evolving System

Level 4 represents a profound leap from delegation to autonomous creation and adaptation. At this level, an agentic system can identify gaps in its own capabilities and dynamically create new tools or even new agents to fill them. It moves from using a fixed set of resources to actively expanding them.

Following our example, the “Project Manager” agent, tasked with the ‘Solaris’ launch, might realize it needs to monitor social media sentiment, but no such tool or agent exists on its team:

Think (Meta-Reasoning): “I must track social media buzz for ‘Solaris,’ but I lack the capability.”
Act (Autonomous Creation): Instead of failing, it invokes a high-level AgentCreator tool with a new mission: “Build a new agent that monitors social media for keywords ‘Solaris headphones’, performs sentiment analysis, and reports a daily summary.”
Observe: A new, specialized SentimentAnalysisAgent is created, tested, and added to the team on the fly, ready to contribute to the original mission.

This level of autonomy—where a system can dynamically expand its own capabilities—turns a team of agents into a truly learning and evolving organization.

5. Core Agent Architecture: Deep Dive

5.1 Model: The “Brain” of Your AI Agent

The LM is the reasoning core of your agent, and its selection is a critical architectural decision that dictates your agent’s cognitive capabilities, operational cost, and speed.

Common Mistake: Treating this choice as a simple matter of picking the model with the highest benchmark score. An agent’s success in a production environment is rarely determined by generic academic benchmarks.

Real-world success demands a model that excels at agentic fundamentals:

Superior reasoning: Navigate complex, multi-step problems
Reliable tool use: Interact with the world

Best Practices:

Start by defining the business problem, then test models against metrics that directly map to that outcome
If your agent needs to write code, test it on your private codebase
If it processes insurance claims, evaluate its ability to extract information from your specific document formats
Cross-reference this analysis with the practicalities of cost and latency

The “best” model is the one that sits at the optimal intersection of quality, speed, and price for your specific task.

Model Routing Strategy: A “team of specialists” approach.

Complex tasks: Use a frontier model like Gemini 2.5 Pro for initial planning and complex reasoning
Simple, high-volume tasks: Use a much faster and more cost-effective model like Gemini 2.5 Flash for classifying user intent or summarizing text

Important Reminder: The AI landscape is in a state of constant, rapid evolution. The model you choose today will be superseded in six months. A “set it and forget it” mindset is unsustainable. Build a robust CI/CD pipeline that continuously evaluates new models against your key business metrics.

5.2 Tools: The “Hands” of Your AI Agent

If the model is the agent’s brain, tools are the hands that connect its reasoning to reality. They allow the agent to move beyond its static training data to retrieve real-time information and take action in the world.

5.2.1 Retrieving Information: Grounding in Reality

Tool Type	Description	Use Case
Retrieval-Augmented Generation (RAG)	Gives the agent a “library card” to query external knowledge	Query internal company documents to web knowledge via Google Search
Natural Language to SQL (NL2SQL)	Allows the agent to query databases to answer analytic questions	“What were our top-selling products last quarter?”

By looking things up before speaking—whether in a document or a database—the agent grounds itself in fact, dramatically reducing hallucinations.

5.2.2 Executing Actions: Changing the World

The true power of agents is unleashed when they move from reading information to actively doing things:

Wrapping existing APIs and code functions as tools: Send an email, schedule a meeting, or update a customer record in ServiceNow
Write and execute code on the fly: In a secure sandbox, generate a SQL query or a Python script to solve a complex problem or perform a calculation

This transforms the agent from a knowledgeable assistant into an autonomous actor.

5.2.3 Human in the Loop (HITL) Tools

An agent can use Human in the Loop (HITL) tools to pause its workflow and:

Ask for confirmation (e.g., ask_for_confirmation())
Request specific information from a user interface (e.g., ask_for_date_input())

This ensures a person is involved in critical decisions. HITL could be implemented via SMS text messaging and a task in a database.

5.2.4 Function Calling: Connecting Tools to Your Agent

For an agent to reliably do “function calling” and use tools, it needs clear instructions, secure connections, and orchestration:

Standard	Description
OpenAPI specification	Long-standing standard providing a structured contract that describes a tool’s purpose, required parameters, and expected response
Model Context Protocol (MCP)	Open standard for simpler discovery and connection to tools, popular because of convenience
Native tools	A few models have native tools, like Gemini with native Google Search, where function invocation happens as part of the LM call itself

5.3 The Orchestration Layer: The Nervous System

If the model is the agent’s brain and the tools are its hands, the orchestration layer is the central nervous system that connects them. It is:

The engine that runs the “Think, Act, Observe” loop
The state machine that governs the agent’s behavior
The place where a developer’s carefully crafted logic comes to life

This layer is not just plumbing; it is the conductor of the entire agentic symphony, deciding when the model should reason, which tool should act, and how the results of that action should inform the next movement.

5.3.1 Core Design Choices

Choice One: Degree of Autonomy

The choice exists on a spectrum:

Spectrum Position	Description
Deterministic end	Predictable workflows that call an LM as a tool for a specific task—a sprinkle of AI to augment an existing process
Autonomous end	The LM in the driver’s seat, dynamically adapting, planning and executing tasks to achieve a goal

Choice Two: Implementation Method

Method	Best For	Advantages
No-code builders	Business users automating structured tasks	Speed and accessibility
Code-first frameworks (e.g., Google ADK)	Complex, mission-critical systems	Deep control, customizability, and integration capabilities

Key Requirements for a Production-Grade Framework:

Openness: Allow you to plug in any model or tool to prevent vendor lock-in
Precise control: Enable a hybrid approach where non-deterministic LM reasoning is governed by hard-coded business rules
Observability: When an agent behaves unexpectedly, you cannot simply put a breakpoint in the model’s “thought.” A robust framework generates detailed traces and logs, exposing the entire reasoning trajectory

5.3.2 Instruct with Domain Knowledge and Persona

Within this framework, the developer’s most powerful lever is to instruct the agent with domain knowledge and a distinct persona. This is accomplished through a system prompt or a set of core instructions.

This isn’t just a simple command; it is the agent’s constitution. Here, you tell it:

You are a helpful customer support agent for Acme Corp, ...

And provide:

Constraints
Desired output schema
Rules of engagement
A specific tone of voice
Explicit guidance on when and why it should use its tools

A few example scenarios in the instructions are usually very effective.

5.3.3 Augment with Context (Memory System)

The agent’s “memory” is orchestrated into the LM context window at runtime:

Memory Type	Description	Implementation
Short-term memory	The agent’s active “scratchpad,” maintaining the running history of the current conversation	Abstractions like state, artifacts, sessions, or threads
Long-term memory	Provides persistence across sessions	Almost always implemented as another specialized tool—a RAG system connected to a vector database or search engine

The orchestrator gives the agent the ability to pre-fetch and actively query its own history, allowing it to “remember” a user’s preferences or the outcome of a similar task from weeks ago for a truly personalized and continuous experience.

6. Multi-Agent Systems and Design Patterns

As tasks grow in complexity, building a single, all-powerful “super-agent” becomes inefficient. The more effective solution is to adopt a “team of specialists” approach, which mirrors a human organization.

Core Concept: A complex process is segmented into discrete sub-tasks, and each is assigned to a dedicated, specialized AI agent. This division of labor allows each agent to be simpler, more focused, and easier to build, test, and maintain.

6.1 Coordinator Pattern

For dynamic or non-linear tasks. Introduces a “manager” agent that:

Analyzes a complex request
Segments the primary task
Intelligently routes each sub-task to the appropriate specialist agent (like a researcher, a writer, or a coder)
Aggregates the responses from each specialist to formulate a final, comprehensive answer

6.2 Sequential Pattern

For more linear workflows, acting like a digital assembly line where the output from one agent becomes the direct input for the next.

Creates a feedback loop:

“Generator” agent: Creates content
“Critic” agent: Evaluates it against quality standards

6.4 Human-in-the-Loop (HITL) Pattern

Critical for high-stakes tasks, creating a deliberate pause in the workflow to get approval from a person before an agent takes a significant action.

7. Agent Deployment and Services

After you have built a local agent, you will want to deploy it to a server where it runs all the time and where other people and agents can use it. Continuing our analogy, deployment and services would be the body and legs for our agent.

7.1 Services an Agent Requires

Session history and memory persistence
Logging (you’re responsible for deciding what you log)
Security measures for data privacy and data residency
Regulation compliance

7.2 Deployment Options

Option	Best For	Features
Vertex AI Agent Engine	Purpose-built agent deployment	Supports runtime and everything else in one platform
Cloud Run or GKE	Software developers who want to control their application stacks more directly	Add agents and most agent services to a docker container and deploy onto industry-standard runtimes

Tip: Many agent frameworks make deployment easy with a deploy command or a dedicated platform, and these should be used for early exploration and onboarding. Ramping up to a secure and production-ready environment will usually require a bigger investment of time and application of best practices, including CI/CD and automated testing for your agents.

8. Agent Ops: A Structured Approach to the Unpredictable

The transition from traditional, deterministic software to stochastic, agentic systems requires a new operational philosophy.

8.1 Why Traditional Testing Doesn’t Work

Traditional software unit tests could simply assert output == expected; but that doesn’t work when an agent’s response is probabilistic by design. Also, because language is complicated, it usually requires an LM to evaluate “quality”—that the agent’s response does all of what it should, nothing it shouldn’t, and with proper tone.

8.2 What is Agent Ops?

Agent Ops is the disciplined, structured approach to managing this new reality. It is a natural evolution of DevOps and MLOps, tailored for the unique challenges of building, deploying, and governing AI agents, turning unpredictability from a liability into a managed, measurable, and reliable feature.

8.3 Measure What Matters: Instrumenting Success Like an A/B Experiment

Before you can improve your agent, you must define what “better” means in the context of your business. Frame your observability strategy like an A/B test and ask yourself: What are the Key Performance Indicators (KPIs) that prove the agent is delivering value?

These metrics should go beyond technical correctness and measure real-world impact:

Goal completion rates
User satisfaction scores
Task latency
Operational cost per interaction
Most importantly, the impact on business goals like revenue, conversion, or customer retention

8.4 Quality Instead of Pass/Fail: Using an LM Judge

Business metrics don’t tell you if the agent is behaving correctly. Since a simple pass/fail is impossible, we shift to evaluating for quality using an “LM as Judge.”

This involves using a powerful model to assess the agent’s output against a predefined rubric:

Did it give the right answer?
Was the response factually grounded?
Did it follow instructions?

This automated evaluation, run against a golden dataset of prompts, provides a consistent measure of quality.

Best Practices for Creating Evaluation Datasets:

Sample scenarios from existing production or development interactions with the agent
The dataset must cover the full breadth of use cases you expect your users to engage with, plus a few unexpected ones
Evaluation results should always be reviewed by a domain expert before being accepted as valid

8.5 Metrics-Driven Development: Your Go/No-Go for Deployment

Once you have automated dozens of evaluation scenarios and established trusted quality scores, you can confidently test changes to your development agent:

Run the new version against the entire evaluation dataset
Directly compare its scores to the existing production version

This robust system eliminates guesswork, ensuring you are confident in every deployment.

Don’t forget other important factors:

Latency
Cost
Task success rates

For maximum safety: Use A/B deployments to slowly roll out new versions and compare these real-world production metrics alongside your simulation scores.

8.6 Debug with OpenTelemetry Traces: Answering “Why?”

When your metrics dip or a user reports a bug, you need to understand “why.” An OpenTelemetry trace is a high-fidelity, step-by-step recording of the agent’s entire execution path (trajectory), allowing you to debug the agent’s steps.

With traces, you can see:

The exact prompt sent to the model
The model’s internal reasoning (if available)
The specific tool it chose to call
The precise parameters it generated for that tool
The raw data that came back as an observation

Trace data can be seamlessly collected in platforms like Google Cloud Trace, which visualize and search across vast quantities of traces, streamlining root cause analysis.

8.7 Cherish Human Feedback: Guiding Your Automation

Human feedback is not an annoyance to be dealt with; it is the most valuable and data-rich resource you have for improving your agent. When a user files a bug report or clicks the “thumbs down” button, they are giving you a gift: a new, real-world edge case that your automated eval scenarios missed.

An effective Agent Ops process “closes the loop”:

Capture this feedback
Replicate the issue
Convert that specific scenario into a new, permanent test case in your evaluation dataset

This ensures you not only fix the bug but also vaccinate the system against that entire class of error ever happening again.

9. Agent Interoperability

Once you build your high-quality agents, you want to be able to interconnect them with users and other agents. In our body parts analogy, this would be the face of the Agent.

9.1 Agents and Humans

User Interface Interaction

Type	Description
Chatbot	Simplest form—user types a request, agent processes and returns a block of text
Structured data	Advanced agents can provide JSON, etc., to power rich, dynamic front-end experiences
Human in the loop patterns	Intent refinement, goal expansion, confirmation, and clarification requests

Computer Use

A category of tool where the LM takes control of a user interface, often with human interaction and oversight. A computer-use-enabled agent can decide that the next best action is to navigate to a new page, highlight a specific button, or pre-fill a form with relevant information.

UI Control Protocols

Protocol	Description
MCP UI	Controlling UI via MCP tools
AG UI	Sync client state with an agent via event passing and optionally shared state
A2UI	Generation of bespoke interfaces

Real-Time Multimodal Communication

Advanced agents are breaking the text barrier and moving into real-time, multimodal communication with “live mode” creating a more natural, human-like connection.

Technologies like the Gemini Live API enable bidirectional streaming, allowing a user to speak to an agent and interrupt it, just as they would in a natural conversation. This makes the agent a more intuitive and accessible partner.

9.2 Agents and Agents

As an enterprise scales its use of AI, different teams will build different specialized agents. Without a common standard, connecting them would require building a tangled web of brittle, custom API integrations that are impossible to maintain.

Core Challenge:

Discovery: How does my agent find other agents and know what they can do?
Communication: How do we ensure they speak the same language?

Agent2Agent (A2A) Protocol

The open standard designed to solve this problem. It acts as a universal handshake for the agentic economy.

A2A allows any agent to publish a digital “business card”, known as an Agent Card. This simple JSON file advertises:

The agent’s capabilities
Its network endpoint
The security credentials required to interact with it

Difference from MCP: MCP focuses on solving transactional requests, Agent 2 Agent communication is typically for additional problem solving.

Once discovered, agents communicate using a task-oriented architecture. Instead of a simple request-response, interactions are framed as asynchronous “tasks.” A client agent sends a task request to a server agent, which can then provide streaming updates as it works on the problem over a long-running connection.

9.3 Agents and Money

As AI agents do more tasks for us, a few of those tasks involve buying or selling, negotiating or facilitating transactions. The current web is built for humans clicking “buy,” the responsibility is on the human. If an autonomous agent clicks “buy” it creates a crisis of trust—if something goes wrong, who is at fault?

These are complex issues of authorization, authenticity, and accountability. To unlock a true agentic economy, we need new standards that allow agents to transact securely and reliably on behalf of their users.

Emerging Protocols

Protocol	Description
Agent Payments Protocol (AP2)	The definitive language for agentic commerce; extends A2A by introducing cryptographically-signed digital “mandates”
x402	Open internet payment protocol using the standard HTTP 402 “Payment Required” status code; enables frictionless, machine-to-machine micropayments

10. Securing a Single Agent: The Trust Trade-Off

When you create your first AI agent, you immediately face a fundamental tension: the trade-off between utility and security.

To make an agent useful, you must give it power—the autonomy to make decisions and the tools to perform actions like sending emails or querying databases. However, every ounce of power you grant introduces a corresponding measure of risk.

Primary Security Concerns:

Rogue actions: Unintended or harmful behaviors
Sensitive data disclosure

10.1 Defense-in-Depth Approach

You cannot rely solely on the AI model’s judgment, as it can be manipulated by techniques like prompt injection. The best practice is a hybrid, defense-in-depth approach.

Layer One: Traditional Deterministic Guardrails

A set of hardcoded rules that act as a security chokepoint outside the model’s reasoning:

Block any purchase over $100
Require explicit user confirmation before the agent can interact with an external API

This layer provides predictable, auditable hard limits on the agent’s power.

Layer Two: Reasoning-Based Defenses

Using AI to help secure AI:

Adversarial training: Train the model to be more resilient to attacks
Guard models: Employ smaller, specialized models that act like security analysts, examining the agent’s proposed plan before it’s executed, flagging potentially risky or policy-violating steps for review

10.2 Agent Identity: A New Class of Principal

In the traditional security model, there are human users which might use OAuth or SSO, and there are services which use IAM or service accounts. Agents add a 3rd category of principal.

An agent is not merely a piece of code; it is an autonomous actor, a new kind of principal that requires its own verifiable identity. Just as employees are issued an ID badge, each agent on the platform must be issued a secure, verifiable “digital passport.”

Principal Entity	Authentication / Verification	Notes
Users	Authenticated with OAuth or SSO	Human actors with full autonomy and responsibility for their actions
Agents (new category)	Verified with SPIFFE	Agents have delegated authority, taking actions on behalf of users
Service accounts	Integrated into IAM	Applications and containers, fully deterministic, no responsibility for actions

Once an agent has a cryptographically verifiable identity, it can be granted its own specific, least-privilege permissions. The SalesAgent is granted read/write access to the CRM, while the HRonboardingAgent is explicitly denied. This granular control ensures that even if a single agent is compromised or behaves unexpectedly, the potential blast radius is contained.

10.3 Policies to Constrain Access

A policy is a form of authorization (AuthZ), distinct from authentication (AuthN). Typically, policies limit the capabilities of a principal.

Example: “Users in Marketing can only access these 27 API endpoints and cannot execute DELETE commands.”

As we develop agents, we need to apply permissions to:

The agents themselves
Their tools
Other internal agents
Context they can share
Remote agents

Recommended Approach: Apply the principle of least privilege while remaining contextually relevant.

10.4 Securing an ADK Agent

Securing an agent built with the Agent Development Kit (ADK) becomes a practical exercise in applying identity and policy concepts through code and configuration.

Layered Defense:

Identity definition: User account (OAuth), service account (to run code), agent identity (to use delegated authority)
Policies at the API governance layer: Constrain access to services, along with governance supporting MCP and A2A services
Guardrails in tools, models, and sub-agents: Ensure that no matter what the LM reasons or what a malicious prompt might suggest, the tool’s own logic will refuse to execute an unsafe or out-of-policy action

ADK’s Callbacks and Plugins:

before_tool_callback: Allows you to inspect the parameters of a tool call before it runs, validating them against the agent’s current state to prevent misaligned actions
Plugins: For more reusable policies. A common pattern is a “Gemini as a Judge” that uses a fast, inexpensive model like Gemini Flash-Lite to screen user inputs and agent outputs for prompt injections or harmful content in real time

Model Armor: For organizations that prefer a fully managed, enterprise-grade solution, Model Armor can be integrated as an optional service. It acts as a specialized security layer that screens prompts and responses for a wide range of threats, including prompt injection, jailbreak attempts, sensitive data (PII) leakage, and malicious URLs.

11. Scaling Up from a Single Agent to an Enterprise Fleet

The production success of a single AI agent is a triumph. Scaling to a fleet of hundreds is a challenge of architecture.

11.1 Security and Privacy: Hardening the Agentic Frontier

An enterprise-grade platform must address the unique security and privacy challenges inherent to generative AI:

Threats:

Prompt injection: Malicious actors hijacking the agent’s instructions
Data poisoning: Corrupting the information it uses for training or RAG
Data leakage: A poorly constrained agent inadvertently leaking sensitive customer data or proprietary information in its responses

Defense-in-Depth Strategy:

Data protection: Ensure that an enterprise’s proprietary information is never used to train base models and is protected by controls like VPC Service Controls
Input and output filtering: Act like a firewall for prompts and responses
Contractual protections: Provide intellectual property indemnity for both the training data and the generated output

11.2 Agent Governance: A Control Plane instead of Sprawl

As agents and their tools proliferate across an organization, they create a new, complex network of interactions and potential vulnerabilities—a challenge often called “agent sprawl.”

Managing this requires implementing a higher-order architectural approach: a central gateway that serves as a control plane for all agentic activity.

The Gateway Approach creates a control system, establishing a mandatory entry point for all agentic traffic:

User-to-agent prompts or UI interactions
Agent-to-tool calls (via MCP)
Agent-to-agent collaborations (via A2A)
Direct inference requests to LMs

Two Primary Functions of the Control Plane:

Function	Description
Runtime Policy Enforcement	Acts as the architectural chokepoint for implementing security, handles authentication and authorization, creates common logs, metrics, and traces for every transaction
Centralized Governance	A central registry—an enterprise app store for agents and tools—allows developers to discover and reuse existing assets and gives administrators a complete inventory

11.3 Cost and Reliability: The Infrastructure Foundation

Enterprise-grade agents must be both reliable and cost-effective:

An agent that frequently fails or provides slow results has a negative ROI
An agent that is prohibitively expensive cannot scale to meet business demands

Infrastructure Options Spectrum:

Need	Solution
Irregular traffic	Scale-to-zero feature
Mission-critical, latency-sensitive workloads	Dedicated, guaranteed capacity, such as Provisioned Throughput for LM services or 99.9% SLAs for runtimes like Cloud Run

12. How Agents Evolve and Learn

Agents deployed in the real world operate in dynamic environments where policies, technologies, and data formats are constantly changing. Without the ability to adapt, an agent’s performance will degrade over time—a process often called “aging”—leading to a loss of utility and trust.

Manually updating a large fleet of agents to keep pace with these changes is uneconomical and slow. A more scalable solution is to design agents that can learn and evolve autonomously, improving their quality on the job with minimal engineering effort.

12.1 How Agents Learn

Learning Sources:

Source	Description
Runtime Experience	Session logs, traces, and memory that capture successes, failures, tool interactions, and decision trajectories
Human-in-the-Loop Feedback	Provides authoritative corrections and guidance
External Signals	New external documents, such as updated enterprise policies, public regulatory guidelines, or critiques from other agents

Most Successful Adaptation Techniques:

Technique	Description
Enhanced Context Engineering	The system continuously refines its prompts, few-shot examples, and the information it retrieves from memory
Tool Optimization and Creation	The agent’s reasoning can identify gaps in its capabilities and act to fill them—gaining access to a new tool, creating one on the fly, or modifying an existing tool

12.2 Example: Learning New Compliance Guidelines

Consider an enterprise agent operating in a heavily regulated industry like finance or life sciences. Its task is to generate reports that must comply with privacy and regulatory rules (e.g., GDPR).

Multi-Agent Workflow Implementation:

Querying Agent: Retrieves raw data in response to a user request
Reporting Agent: Synthesizes this data into a draft report
Critiquing Agent: Armed with known compliance guidelines, reviews the report. If it encounters ambiguity or requires final sign-off, it escalates to a human domain expert
Learning Agent: Observes the entire interaction, paying special attention to the corrective feedback from the human expert, then generalizes this feedback into a new, reusable guideline

Example: If a human expert flags that certain household statistics must be anonymized, the Learning Agent records this correction. The next time a similar report is generated, the Critiquing Agent will automatically apply this new rule, reducing the need for human intervention.

12.3 Simulation and Agent Gym: The Next Frontier

The design pattern we presented can be categorized as in-line learning, where agents need to learn with the resources and design pattern they were engineered with.

More advanced approaches are now being researched: a dedicated platform engineered to optimize the multi-agent system in offline processes with advanced tooling and capabilities, which are not part of the multi-agent run-time environment.

Key Attributes of an Agent Gym:

Not in the execution path: A standalone off-production platform that can have the assistance of any LM model, offline tools, and more
Simulation environment: The agent can ’exercise’ on new data and learn; excellent for ’trial-and-error’ with many optimization pathways
Advanced synthetic data generators: Guide the simulation to be as real as possible and pressure test the agent (including red-teaming, dynamic evaluation, and critiquing agents)
Non-fixed optimization tools arsenal: Can adopt new tools through open protocols such as MCP or A2A, or in a more advanced setting—learn new concepts and craft tools around them
Connection to human expert fabric: When unable to overcome certain edge cases (due to the well-known problem of ’tribal knowledge’ in the enterprise), the Agent Gym can connect to domain experts and consult with them on the right set of outcomes to guide the next set of optimizations

13. Examples of Advanced Agents

13.1 Google Co-Scientist

Co-Scientist is an advanced AI agent designed to function as a virtual research collaborator, accelerating scientific discovery by systematically exploring complex problem spaces.

Capabilities:

Enables researchers to define a goal
Ground the agent in specified public and proprietary knowledge sources
Generate and evaluate a landscape of novel hypotheses

How It Works:

Co-Scientist spawns an entire ecosystem of agents collaborating with each other:

Supervisor Agent: Acts as the manager, delegating tasks to a team of specialized agents and distributing resources like computing power
Generation Agent: Literature exploration, simulated scientific debate
Reflection Agent: Full review with web search, simulation review, tournament review, deep verification
Evolution Agent: Inspiration from other ideas, simplification, research extension
Proximity Check Agent: Checks similarity between hypotheses
Meta-review Agent: Research overview formulation

The various agents work for hours, or even days, and keep improving the generated hypotheses, running loops and meta loops that improve not only the generated ideas, but also the way that we judge and create new ideas.

13.2 AlphaEvolve Agent

AlphaEvolve is an AI agent that discovers and optimizes algorithms for complex problems in mathematics and computer science.

How It Works:

Combines the creative code generation of Gemini language models with an automated evaluation system, using an evolutionary process:

The AI generates potential solutions
An evaluator scores them
The most promising ideas are used as inspiration for the next generation of code

Breakthroughs Achieved:

Improving the efficiency of Google’s data centers, chip design, and AI training
Discovering faster matrix multiplication algorithms
Finding new solutions to open mathematical problems

Problems AlphaEvolve Excels At: Problems where verifying the quality of a solution is far easier than finding it in the first place.

Human-AI Collaboration:

Transparent Solutions: The AI generates solutions as human-readable code, allowing users to understand the logic, gain insights, trust the results, and directly modify the code for their needs
Expert Guidance: Human expertise is essential for defining the problem; users guide the AI by refining evaluation metrics and steering the exploration

14. Conclusion

Generative AI agents mark a pivotal evolution, shifting artificial intelligence from a passive tool for content creation to an active, autonomous partner in problem-solving.

14.1 Core Architecture Recap

We have deconstructed the agent into its three essential components:

The reasoning Model (“the Brain”)
The actionable Tools (“the Hands”)
The governing Orchestration Layer (“the Nervous System”)

It is the seamless integration of these parts, operating in a continuous “Think, Act, Observe” loop, that unlocks an agent’s true potential.

14.2 The New Developer Paradigm

The central challenge, and opportunity, lies in a new developer paradigm:

We are no longer simply “bricklayers” defining explicit logic; we are “architects” and “directors” who must guide, constrain, and debug an autonomous entity.

The flexibility that makes LMs so powerful is also the source of their unreliability. Success, therefore, is not found in the initial prompt alone, but in the engineering rigor applied to the entire system:

Robust tool contracts
Resilient error handling
Sophisticated context management
Comprehensive evaluation

14.3 Key Principles

Agent = Model + Tools + Orchestration Layer + Deployment Services
Five-Step Loop: Get Mission → Scan Scene → Think It Through → Take Action → Observe & Iterate
Taxonomy guides design: From Level 0 Core Reasoning to Level 4 Self-Evolving
Context engineering is key: Managing the LM’s attention to achieve accurate results
Agent Ops is essential: Metrics-driven development, LM-as-Judge, OpenTelemetry traces
Security requires defense-in-depth: Combination of identity, policies, and guardrails
Interoperability enables scale: A2A protocol, central governance, control plane

This article summarizes the core insights from Google’s “Introduction to Agents” whitepaper (Alan Blount, Antonio Gulli, Shubham Saboo, Michael Zimmermann, Vladimir Vuskovic, November 2025).

1. From Predictive AI to Autonomous Agents#

1.1 What is an AI Agent?#

2. The Four Core Components of an AI Agent#

2.1 The Model (“The Brain”)#

2.2 Tools (“The Hands”)#

2.3 The Orchestration Layer (“The Nervous System”)#

2.4 Deployment (“The Body and Legs”)#

3. The Agentic Problem-Solving Process: Five Steps#

3.1 Get the Mission#

3.2 Scan the Scene#

3.3 Think It Through#

3.4 Take Action#

3.5 Observe and Iterate#

4. A Taxonomy of Agentic Systems (Five Levels)#

Level 0: The Core Reasoning System#

Level 1: The Connected Problem-Solver#

Level 2: The Strategic Problem-Solver#

Level 3: The Collaborative Multi-Agent System#

Level 4: The Self-Evolving System#

5. Core Agent Architecture: Deep Dive#

5.1 Model: The “Brain” of Your AI Agent#

5.2 Tools: The “Hands” of Your AI Agent#

5.2.1 Retrieving Information: Grounding in Reality#

5.2.2 Executing Actions: Changing the World#

5.2.3 Human in the Loop (HITL) Tools#

5.2.4 Function Calling: Connecting Tools to Your Agent#

5.3 The Orchestration Layer: The Nervous System#

5.3.1 Core Design Choices#

5.3.2 Instruct with Domain Knowledge and Persona#

5.3.3 Augment with Context (Memory System)#

6. Multi-Agent Systems and Design Patterns#

6.1 Coordinator Pattern#

6.2 Sequential Pattern#

6.3 Iterative Refinement Pattern#

6.4 Human-in-the-Loop (HITL) Pattern#

7. Agent Deployment and Services#

7.1 Services an Agent Requires#

7.2 Deployment Options#

8. Agent Ops: A Structured Approach to the Unpredictable#

8.1 Why Traditional Testing Doesn’t Work#

8.2 What is Agent Ops?#

8.3 Measure What Matters: Instrumenting Success Like an A/B Experiment#

8.4 Quality Instead of Pass/Fail: Using an LM Judge#

8.5 Metrics-Driven Development: Your Go/No-Go for Deployment#

8.6 Debug with OpenTelemetry Traces: Answering “Why?”#

8.7 Cherish Human Feedback: Guiding Your Automation#

9. Agent Interoperability#

9.1 Agents and Humans#

User Interface Interaction#

Computer Use#

UI Control Protocols#

Real-Time Multimodal Communication#

9.2 Agents and Agents#

Agent2Agent (A2A) Protocol#

9.3 Agents and Money#

Emerging Protocols#

10. Securing a Single Agent: The Trust Trade-Off#

10.1 Defense-in-Depth Approach#

10.2 Agent Identity: A New Class of Principal#

10.3 Policies to Constrain Access#

10.4 Securing an ADK Agent#

11. Scaling Up from a Single Agent to an Enterprise Fleet#

11.1 Security and Privacy: Hardening the Agentic Frontier#

11.2 Agent Governance: A Control Plane instead of Sprawl#

11.3 Cost and Reliability: The Infrastructure Foundation#

12. How Agents Evolve and Learn#

12.1 How Agents Learn#

12.2 Example: Learning New Compliance Guidelines#

12.3 Simulation and Agent Gym: The Next Frontier#

13. Examples of Advanced Agents#

13.1 Google Co-Scientist#

13.2 AlphaEvolve Agent#

14. Conclusion#

14.1 Core Architecture Recap#

14.2 The New Developer Paradigm#

14.3 Key Principles#