Map Website Tool

Usage

Portia offers both open source tools as well as a cloud-hosted library of tools to save you development time. You can dig into the specs of those tools in our open source repo (SDK repo ↗).

You can import our open source tools into your project using from portia.open_source_tools.registry import open_source_tool_registry and load them into an InMemoryToolRegistry object. You can also combine their use with cloud or custom tools as explained in the docs (Add custom tools ↗).

Tool details

Tool ID: map_tool

Tool description: Maps websites using graph-based traversal that can explore hundreds of paths in parallel with intelligent discovery to generate comprehensive site maps. Provide a URL and the tool will discover and return all accessible pages on that website. Supports depth control, domain filtering, path selection, and various mapping options for comprehensive site reconnaissance and URL discovery.

Usage notes:

This tool uses the Tavily API. You can sign up to obtain a Tavily API key (↗) and set it in the environment variable TAVILY_API_KEY.

Args schema:

{
  "description": "Input for MapTool.",
  "properties": {
    "url": {
      "description": "The root URL to begin the mapping (e.g., 'docs.tavily.com')",
      "title": "Url",
      "type": "string"
    },
    "max_depth": {
      "default": 1,
      "description": "Max depth of the mapping. Defines how far from the base URL the crawler can explore",
      "title": "Max Depth",
      "type": "integer"
    },
    "max_breadth": {
      "default": 20,
      "description": "Max number of links to follow per level of the tree (i.e., per page)",
      "title": "Max Breadth",
      "type": "integer"
    },
    "limit": {
      "default": 50,
      "description": "Total number of links the crawler will process before stopping",
      "title": "Limit",
      "type": "integer"
    },
    "instructions": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Natural language instructions for the crawler (e.g., 'Python SDK')",
      "title": "Instructions"
    },
    "select_paths": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Regex patterns to select only URLs with specific path patterns (e.g., ['/docs/.*', '/api/v1.*'])",
      "title": "Select Paths"
    },
    "select_domains": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Regex patterns to select crawling to specific domains or subdomains (e.g., ['^docs\\.example\\.com$'])",
      "title": "Select Domains"
    },
    "exclude_paths": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Regex patterns to exclude URLs with specific path patterns (e.g., ['/private/.*', '/admin/.*'])",
      "title": "Exclude Paths"
    },
    "exclude_domains": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Regex patterns to exclude specific domains or subdomains from crawling (e.g., ['^private\\.example\\.com$'])",
      "title": "Exclude Domains"
    },
    "allow_external": {
      "default": false,
      "description": "Whether to allow following links that go to external domains",
      "title": "Allow External",
      "type": "boolean"
    },
    "categories": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Filter URLs using predefined categories like 'Documentation', 'Blog', 'API', etc.",
      "title": "Categories"
    }
  },
  "required": [
    "url"
  ],
  "title": "MapToolSchema",
  "type": "object"
}

Output schema:

('str', 'str: list of discovered URLs on the website')

Usage​

Tool details​

Usage

Tool details