Stagehand gets even better – The AI Web Agent SDK

The Stagehand Philosophy: Control and Flexibility

Let’s be honest: traditional browser automation frameworks such as Playwright, Puppeteer, or Selenium are fantastic for executing explicit commands—click this here, type that there—but they crumble when a website changes just slightly. A tiny UI tweak can break an entire script, leaving you scrambling to fix brittle code. On the other end of the spectrum, full agent-based solutions like OpenAI Operator or Anthropic Computer Use promise full automation from just a prompt. You simply instruct the agent in natural language, and it takes over. However, that level of abstraction often comes at the cost of control—developers can end up with unpredictable outcomes.

Stagehand was born out of the need for a middle ground. Instead of forcing you to choose between writing fragile code or handing everything over to an opaque agent, Stagehand gives you the best of both worlds:

Atomic Instructions: The framework provides three key primitives—act(), extract(), and observe()—that let you define precise, one-to-one browser interactions. Think of these as the building blocks of reliable automation: you tell Stagehand to "click this button" or "type this text," and it maps that command directly to a browser action. In the new Stagehand, act() no longer recursively loops. Complex, multi-step actions are now handled by agent().
await page.act("click on the contributors selection");

DynamicAgent: For those tasks that require higher-level decision making, Stagehand introduces the Stagehand agent(). This component breaks down even complex workflows into a sequence of browser commands. It allows you to delegate high-level instructions—like “retrieve the top 5 contributors to the stagehand repo”—while still maintaining granular control over how each step is executed. agent() caches the preview steps it took, reducing LLMs calls and optimizing performance. It makes new calls when the cached actions fail.

A High-Level Use Case Comparison

Imagine you need to automate the process of filling out a web form and then extracting specific lead data:

Using Playwright (Traditional Automation): You’d write explicit code to navigate to the form, identify each field, input data, and submit the form. While this offers precision, if the website’s structure changes even slightly (say, the form’s layout is updated), your code might break, leaving you with brittle automation.
Using a Fully Agent-Based Product (OpenAI Operator): You’d simply instruct the agent in natural language: “Fill out this form and extract the lead information.” The agent would decide the best approach, but you’d lose control over which elements are interacted with and how. The process might work in some cases but could lead to unpredictable outcomes if the agent’s reasoning isn’t aligned with the desired precision.
Using Stagehand: You can combine the strengths of both approaches. With Stagehand, you can use act, extract, and observe to issue precise commands (e.g., “click the submit button,” “extract the email field”) while also employing the Stagehand Agent for high-level orchestration when necessary. This gives you an automation script that’s both resilient to UI changes and flexible enough to handle complex decision-making processes.