Overview

This guide demonstrates how to utilize tool calling in Spongecake. Tool calling allows your agent to invoke custom functions directly from within the desktop container to enhance its capabilities and accuracy. This guide covers:

  • Defining custom tools and function mappings.
  • Using tools effectively with desktop.action().
  • Extending tool calling for various advanced workflows.

We’ll demonstrate how to fetch targeted HTML content from a webpage, focusing on Wikipedia searches.

Key Concepts

  • Defining Tools: Tools are custom functions agents can invoke during their tasks to access specific data or perform targeted actions.
  • Function Map: Links tool definitions to actual Python functions, allowing the agent to call them directly.
  • Agent Tool Calls: Agents can request or return specific information dynamically during their execution, enhancing their effectiveness in targeted actions.

Example: Fetching Wikipedia Content

This example shows how to define a custom tool for fetching specific HTML content from a Wikipedia page.

Explanation

  1. Custom Tool Definition:

    • Defined a custom tool (get_wikipedia_elements) specifically targeting Wikipedia page content.
  2. Function Map:

    • Connected the defined tool to the corresponding Python function so the agent can invoke it.
  3. Agent Interaction:

    • The agent autonomously calls the tool during its task to retrieve precise page content, avoiding unnecessary actions like scrolling.

Extending Tool Calling

Tool calling in Spongecake is highly flexible and can be extended to:

  • Advanced Web Searching: Dynamically fetch specific page sections to quickly answer questions.
  • Contextual Data Fetching: Provide agents with relevant contextual data from your application during execution.
  • Interactive Automation: Allow agents to trigger specific functions within your application, passing dynamic arguments based on runtime context.

Using tool calling significantly boosts your agent’s efficiency, enabling more precise, powerful, and context-aware automations.