Skip to main content

Using browser tools

Browser tools (SDK ↗) can deploy an agent to browse the internet and retrieve data or enact actions on your behalf. Portia will use Browser tools when it recognises there is a web-based task to be performed. We use the Browser Use (↗) library to offer a multi-modal web agent that will visually and textually analyse a website in order to navigate it and carry out a task.

Our browser tool can be used in two modes:

  • Remote mode: Runs on a remote chromium instance using Browserbase (↗) as the underlying infrastructure. Browserbase offers infrastructure for headless browsers remotely. We spin up remote sessions for your end-users which persist through clarifications.
  • Local mode (DEFAULT): Runs on a chrome instance on your own computer. Requires Chrome to be started fresh by the agent to work.

The underlying library for navigating the page is provided by Browser Use (↗). It uses a number of LLM calls to navigate the page and complete the action.

Setting up the browser based tools

To use browserbase infrastructure, you must ensure that you have set the BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID in your .env file (or equivalent). These can be obtained by creating an account on Browserbase (↗). The current behaviour requires a paid version of Browserbase to use.

Using browser based tools in Portia

The BrowserTool is located in our open source tools folder SDK ↗. Additionally, there are 2 ways to use the tool:

  • BrowserTool(): This is a general browser tool and it will be used when a URL is provided as part of the query.
BrowserTool example
from portia import Config, Portia
from portia.open_source_tools.browser_tool import BrowserTool

task = "Find my connections called 'Bob' on LinkedIn (https://www.linkedin.com)"

# Needs BrowserBase API Key and project_id
portia = Portia(config=Config.from_default(),
tools=[BrowserTool()])
  • BrowserToolForUrl(url): To restrict the browser tool to a specific URL. This is particularly useful to ensure that the planner is restricted to the domains that you want it to be support.
BrowserToolForUrl example
from portia import Config, Portia
from portia.open_source_tools.browser_tool import BrowserToolForUrl

task = "Find my connections called 'Bob' on LinkedIn"

# Needs BrowserBase API Key and project_id
portia = Portia(config=Config.from_default(),
tools=[BrowserToolForUrl("https://www.linkedin.com")])

A Simple E2E Example

Full example
from dotenv import load_dotenv

from portia import (
ActionClarification,
Config,
PlanRunState,
Portia,
)
from portia.open_source_tools.browser_tool import BrowserTool

load_dotenv(override=True)

task = "Get the top news headline from the BBC news website (https://www.bbc.co.uk/news)"

portia = Portia(Config.from_default(), tools=[BrowserTool()])

plan_run = portia.run(task)

while plan_run.state == PlanRunState.NEED_CLARIFICATION:
# If clarifications are needed, resolve them before resuming the workflow
print("\nPlease resolve the following clarifications to continue")
for clarification in plan_run.get_outstanding_clarifications():
# Handling of Action clarifications
if isinstance(clarification, ActionClarification):
print(f"{clarification.user_guidance} -- Please click on the link below to proceed.")
print(clarification.action_url)
input("Press Enter to continue...")

# Once clarifications are resolved, resume the workflow
plan_run = portia.resume(plan_run)

Authentication with browser based tools

Recap: Portia Authentication

Portia uses Clarifications to handle human-in-the-loop authentication (full explanation here ↗). In our OAuth based tools, the user clicks on a link, authenticates and their token is used when the agents resumes.

In the browser tool case, whenever a browser tool encounters a page that requires authentication, it will raise a clarification request to the user, just like API-based Portia tools. The user will need to provide the necessary credentials or authentication information into the website to proceed. The cookies for that authentication are then used for the rest of the plan run.

Browser authentication with clarifications

In the case of Browserbase Authentication, the end-user will be provided with a URL starting with browserbase.com/devtools-fullscreen/.... When the end-user visits this page, they will see the authentication screen to enter their credentials (and any required 2FA or similar checks).

Once the end-user has performed the authentication, they should then indicate to your application that they have completed the flow, and you should call portia.resume(plan_run) to resume the agent. Note if you are using the CLIClarificationHandler, this will not work in this way and you will need to override it to ensure this behaviour.

The authentication credentials will persist until the agent completes the flow.

When to use API vs browser based tools

Browser based tools are very flexible in terms of what they do, however they do not have the same tight permissioning as OAuth tools and require more LLM calls so we recommend balancing between the two and using browser tools only when APIs are not available.

Known issues and caveats

Popups and authentication

When using Browserbase as the underlying browser infrastructure, if authentication requires a popup, it will not show to the user and they will not be able to log-in. We are investigating solutions for this at the moment.

Local chrome failing to connect

If you see an issue whereby Chrome opens, but then immediately closes and restarts, the issue is likely because it can't find the user data directory and the debug server is not starting. You can fix this by specifying the env variable PORTIA_BROWSER_LOCAL_EXTRA_CHROMIUM_ARGS="--user-data-dir='path/to/dir'" and there's more information about this on the browser-use issue link ↗.

LLM moderation

We have occasionally observed that LLMs might get moderated on tasks that look like authentication requests to websites. These issues are typically transient but you may want to adjust the task or plan to avoid direct requests for the agent to login to a website.