Skip to content

LLM 智能体

LlmAgent(常简称为Agent)是ADK中的核心组件,作为应用程序的"思考中枢"。它借助大模型(LLM)的强大能力进行推理、理解自然语言、做出决策、生成响应以及与工具交互。

与遵循预定义执行路径的确定性工作流智能体不同,LlmAgent的行为具有非确定性。它利用大模型来解析指令和上下文,动态决定后续操作、选择使用哪些工具(如有需要)或是否将控制权转移给其他智能体。

构建高效的LlmAgent需要定义其身份标识,通过指令明确引导其行为,并为其配备必要的工具和能力。

定义智能体身份与用途

首先需要明确智能体是什么以及用途

  • name(必填): 每个智能体都需要唯一的字符串标识符。该name对内部运作至关重要,特别是在多智能体系统中,智能体需要相互引用或委派任务时。选择能反映智能体功能的描述性名称(例如customer_support_routerbilling_inquiry_agent)。避免使用user等保留名称。

  • description(可选,推荐用于多智能体系统): 提供智能体能力的简明摘要。该描述主要供其他大模型智能体判断是否应将任务路由到本智能体。描述需足够具体以区别于同类智能体(例如"处理当前账单查询",而非简单的"账单智能体")。

  • model(必填): 指定支撑该智能体推理的基础大模型。这是一个字符串标识符,如"gemini-2.0-flash"。模型选择会影响智能体的能力、成本和性能。可用选项及考量因素请参阅模型页面。

# Example: Defining the basic identity
capital_agent = LlmAgent(
    model="gemini-2.0-flash",
    name="capital_agent",
    description="Answers user questions about the capital city of a given country."
    # instruction and tools will be added next
)

引导智能体:指令(instruction

instruction参数可以说是塑造LlmAgent行为最关键的因素。这个字符串(或返回字符串的函数)用于告知智能体:

  • 其核心任务或目标
  • 其个性或角色设定(例如"你是一个乐于助人的助手"、"你是个风趣的海盗")
  • 行为约束(例如"仅回答关于X的问题"、"绝不透露Y")
  • 如何使用其tools。应说明每个工具的用途及调用时机,补充工具自身的描述
  • 期望的输出格式(例如"以JSON格式响应"、"提供带项目符号的列表")

有效指令的设计技巧:

  • 清晰具体: 避免歧义,明确说明期望行为和结果
  • 使用Markdown: 对复杂指令使用标题、列表等提高可读性
  • 提供示例(Few-Shot): 对于复杂任务或特定输出格式,直接在指令中包含示例
  • 引导工具使用: 不仅列出工具,还要解释何时为何使用它们
# Example: Adding instructions
capital_agent = LlmAgent(
    model="gemini-2.0-flash",
    name="capital_agent",
    description="Answers user questions about the capital city of a given country.",
    instruction="""You are an agent that provides the capital city of a country.
When a user asks for the capital of a country:
1. Identify the country name from the user's query.
2. Use the `get_capital_city` tool to find the capital.
3. Respond clearly to the user, stating the capital city.
Example Query: "What's the capital of France?"
Example Response: "The capital of France is Paris."
""",
    # tools will be added next
)

(注:对于适用于系统中所有智能体的指令,可考虑在根智能体上使用global_instruction,详见多智能体章节。)

装备智能体:工具(tools

工具赋予LlmAgent超越大模型内置知识或推理的能力。它们使智能体能够与外部世界交互、执行计算、获取实时数据或完成特定操作。

  • tools(可选): 提供智能体可使用的工具列表。列表中的每个项目可以是:
    • Python函数(自动包装为FunctionTool
    • 继承自BaseTool的类实例
    • 其他智能体实例(AgentTool,实现智能体间委派——参见多智能体

大模型会基于对话内容和指令,根据函数/工具名称、描述(来自文档字符串或description字段)及参数模式来决定调用哪个工具。

# Define a tool function
def get_capital_city(country: str) -> str:
  """Retrieves the capital city for a given country."""
  # Replace with actual logic (e.g., API call, database lookup)
  capitals = {"france": "Paris", "japan": "Tokyo", "canada": "Ottawa"}
  return capitals.get(country.lower(), f"Sorry, I don't know the capital of {country}.")

# Add the tool to the agent
capital_agent = LlmAgent(
    model="gemini-2.0-flash",
    name="capital_agent",
    description="Answers user questions about the capital city of a given country.",
    instruction="""You are an agent that provides the capital city of a country... (previous instruction text)""",
    tools=[get_capital_city] # Provide the function directly
)

更多工具相关信息请参阅工具章节。

高级配置与控制

除核心参数外,LlmAgent还提供多个选项进行精细控制:

微调大模型生成(generate_content_config

可通过generate_content_config调整底层大模型的响应生成方式。

  • generate_content_config(可选): 传递google.genai.types.GenerateContentConfig实例以控制temperature(随机性)、max_output_tokens(响应长度)、top_ptop_k和安全设置等参数。

    from google.genai import types
    
    agent = LlmAgent(
        # ... other params
        generate_content_config=types.GenerateContentConfig(
            temperature=0.2, # 更确定性的输出
            max_output_tokens=250
        )
    )
    

结构化数据(input_schemaoutput_schemaoutput_key

对于需要结构化数据交换的场景,可使用Pydantic模型。

  • input_schema(可选): 定义表示预期输入结构的Pydantic BaseModel类。若设置,传递给该智能体的用户消息内容必须是符合此模式的JSON字符串。指令应相应引导用户或前置智能体。

  • output_schema(可选): 定义表示期望输出结构的Pydantic BaseModel类。若设置,智能体的最终响应必须是符合此模式的JSON字符串。

    • 限制: 使用output_schema可在大模型内实现受控生成,但会禁用智能体使用工具或转移控制权的能力。指令必须引导大模型直接生成符合模式的JSON。
  • output_key(可选): 提供字符串键。若设置,智能体最终响应的文本内容将自动保存到会话状态字典中的该键下(例如session.state[output_key] = agent_response_text)。这在智能体间或工作流步骤间传递结果时非常有用。

from pydantic import BaseModel, Field

class CapitalOutput(BaseModel):
    capital: str = Field(description="The capital of the country.")

structured_capital_agent = LlmAgent(
    # ... name, model, description
    instruction="""You are a Capital Information Agent. Given a country, respond ONLY with a JSON object containing the capital. Format: {"capital": "capital_name"}""",
    output_schema=CapitalOutput, # Enforce JSON output
    output_key="found_capital"  # Store result in state['found_capital']
    # Cannot use tools=[get_capital_city] effectively here
)

管理上下文(include_contents

控制智能体是否接收先前的对话历史记录。

  • include_contents(可选,默认:'default'): 决定是否向大模型发送contents(历史记录)。

    • 'default':智能体接收相关对话历史记录
    • 'none':智能体不接收任何先前contents。仅基于当前指令和当前轮次提供的输入进行操作(适用于无状态任务或强制特定上下文场景)
    stateless_agent = LlmAgent(
        # ... other params
        include_contents='none'
    )
    

规划与代码执行

对于涉及多步推理或执行代码的复杂场景:

  • planner(可选): 分配BasePlanner实例以实现执行前的多步推理和规划(参见多智能体模式)
  • code_executor(可选): 提供BaseCodeExecutor实例,允许智能体执行大模型响应中的代码块(如Python)(参见工具/内置工具

完整示例

代码

以下是完整的capital_agent示例:

# 基础首都智能体的完整示例代码
# --- Full example code demonstrating LlmAgent with Tools vs. Output Schema ---
import json # Needed for pretty printing dicts

from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
from pydantic import BaseModel, Field

# --- 1. Define Constants ---
APP_NAME = "agent_comparison_app"
USER_ID = "test_user_456"
SESSION_ID_TOOL_AGENT = "session_tool_agent_xyz"
SESSION_ID_SCHEMA_AGENT = "session_schema_agent_xyz"
MODEL_NAME = "gemini-2.0-flash"

# --- 2. Define Schemas ---

# Input schema used by both agents
class CountryInput(BaseModel):
    country: str = Field(description="The country to get information about.")

# Output schema ONLY for the second agent
class CapitalInfoOutput(BaseModel):
    capital: str = Field(description="The capital city of the country.")
    # Note: Population is illustrative; the LLM will infer or estimate this
    # as it cannot use tools when output_schema is set.
    population_estimate: str = Field(description="An estimated population of the capital city.")

# --- 3. Define the Tool (Only for the first agent) ---
def get_capital_city(country: str) -> str:
    """Retrieves the capital city of a given country."""
    print(f"\n-- Tool Call: get_capital_city(country='{country}') --")
    country_capitals = {
        "united states": "Washington, D.C.",
        "canada": "Ottawa",
        "france": "Paris",
        "japan": "Tokyo",
    }
    result = country_capitals.get(country.lower(), f"Sorry, I couldn't find the capital for {country}.")
    print(f"-- Tool Result: '{result}' --")
    return result

# --- 4. Configure Agents ---

# Agent 1: Uses a tool and output_key
capital_agent_with_tool = LlmAgent(
    model=MODEL_NAME,
    name="capital_agent_tool",
    description="Retrieves the capital city using a specific tool.",
    instruction="""You are a helpful agent that provides the capital city of a country using a tool.
The user will provide the country name in a JSON format like {"country": "country_name"}.
1. Extract the country name.
2. Use the `get_capital_city` tool to find the capital.
3. Respond clearly to the user, stating the capital city found by the tool.
""",
    tools=[get_capital_city],
    input_schema=CountryInput,
    output_key="capital_tool_result", # Store final text response
)

# Agent 2: Uses output_schema (NO tools possible)
structured_info_agent_schema = LlmAgent(
    model=MODEL_NAME,
    name="structured_info_agent_schema",
    description="Provides capital and estimated population in a specific JSON format.",
    instruction=f"""You are an agent that provides country information.
The user will provide the country name in a JSON format like {{"country": "country_name"}}.
Respond ONLY with a JSON object matching this exact schema:
{json.dumps(CapitalInfoOutput.model_json_schema(), indent=2)}
Use your knowledge to determine the capital and estimate the population. Do not use any tools.
""",
    # *** NO tools parameter here - using output_schema prevents tool use ***
    input_schema=CountryInput,
    output_schema=CapitalInfoOutput, # Enforce JSON output structure
    output_key="structured_info_result", # Store final JSON response
)

# --- 5. Set up Session Management and Runners ---
session_service = InMemorySessionService()

# Create separate sessions for clarity, though not strictly necessary if context is managed
session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_TOOL_AGENT)
session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_SCHEMA_AGENT)

# Create a runner for EACH agent
capital_runner = Runner(
    agent=capital_agent_with_tool,
    app_name=APP_NAME,
    session_service=session_service
)
structured_runner = Runner(
    agent=structured_info_agent_schema,
    app_name=APP_NAME,
    session_service=session_service
)

# --- 6. Define Agent Interaction Logic ---
async def call_agent_and_print(
    runner_instance: Runner,
    agent_instance: LlmAgent,
    session_id: str,
    query_json: str
):
    """Sends a query to the specified agent/runner and prints results."""
    print(f"\n>>> Calling Agent: '{agent_instance.name}' | Query: {query_json}")

    user_content = types.Content(role='user', parts=[types.Part(text=query_json)])

    final_response_content = "No final response received."
    async for event in runner_instance.run_async(user_id=USER_ID, session_id=session_id, new_message=user_content):
        # print(f"Event: {event.type}, Author: {event.author}") # Uncomment for detailed logging
        if event.is_final_response() and event.content and event.content.parts:
            # For output_schema, the content is the JSON string itself
            final_response_content = event.content.parts[0].text

    print(f"<<< Agent '{agent_instance.name}' Response: {final_response_content}")

    current_session = session_service.get_session(app_name=APP_NAME,
                                                  user_id=USER_ID,
                                                  session_id=session_id)
    stored_output = current_session.state.get(agent_instance.output_key)

    # Pretty print if the stored output looks like JSON (likely from output_schema)
    print(f"--- Session State ['{agent_instance.output_key}']: ", end="")
    try:
        # Attempt to parse and pretty print if it's JSON
        parsed_output = json.loads(stored_output)
        print(json.dumps(parsed_output, indent=2))
    except (json.JSONDecodeError, TypeError):
         # Otherwise, print as string
        print(stored_output)
    print("-" * 30)


# --- 7. Run Interactions ---
async def main():
    print("--- Testing Agent with Tool ---")
    await call_agent_and_print(capital_runner, capital_agent_with_tool, SESSION_ID_TOOL_AGENT, '{"country": "France"}')
    await call_agent_and_print(capital_runner, capital_agent_with_tool, SESSION_ID_TOOL_AGENT, '{"country": "Canada"}')

    print("\n\n--- Testing Agent with Output Schema (No Tool Use) ---")
    await call_agent_and_print(structured_runner, structured_info_agent_schema, SESSION_ID_SCHEMA_AGENT, '{"country": "France"}')
    await call_agent_and_print(structured_runner, structured_info_agent_schema, SESSION_ID_SCHEMA_AGENT, '{"country": "Japan"}')

if __name__ == "__main__":
    await main()

(此示例演示了核心概念。更复杂的智能体可能包含模式、上下文控制、规划等功能)

相关概念(延展主题)

虽然本文涵盖LlmAgent的核心配置,但以下几个相关概念提供更高级的控制,详见其他文档:

  • 回调函数: 使用before_model_callbackafter_model_callback等拦截执行点(模型调用前/后、工具调用前/后)。参见回调函数
  • 多智能体控制: 智能体交互的高级策略,包括规划(planner)、控制智能体转移(disallow_transfer_to_parentdisallow_transfer_to_peers)和系统级指令(global_instruction)。参见多智能体