How do browser agents recover from errors?

The naked observe-think-act loop succeeds maybe 60–70% of the time on real workflows. The other 30% is where production agents earn their keep. Recovery is what separates a 70%-success demo from a 99%-success product — and it's almost always a layered strategy rather than a single fix. The goal isn't to never fail; it's to fail in cheap ways before falling back to expensive ones.

Where errors actually happen

A browser agent can fail at three distinct levels, each with its own recovery shape:

Action-level. The action emitted by the LLM didn't execute — the target element wasn't found, the click didn't register, the page navigated to an error state. Cheap to detect; the action layer returns a failure boolean.
Step-level. The action ran, but produced an unexpected page state — wrong page, modal dialog blocking the next step, stale data. Detectable with a quick comparison against the goal.
Run-level. The agent reported "completed" but the goal wasn't actually reached. Caught by a verifier. The most expensive failure to detect; the most damaging one to silently miss.

A real recovery strategy needs an answer for all three.

The four-layer recovery stack

Each layer addresses a class of failure and costs more than the previous one. Production agents move down the stack only when the cheaper layer can't handle the case:

Retry the same action. Pure transient flakiness — network blip, animation not finished, JavaScript still hydrating. The cheapest fix: wait briefly and retry. Don't re-invoke the LLM; just re-execute. Most layouts settle within 200–500 ms.
Retry with a re-planned action. If the same action keeps failing, the model needs new input. The next iteration includes the failure message in the context, so the LLM sees what didn't work and tries a different approach. Cost: one more LLM call.
Fall back from DOM to vision. When the failure is "I can't find the target element in the DOM" but the element is visibly there (image-rendered button, canvas overlay, shadow DOM), switching to a vision-based observation for that step usually fixes it. Cost: one vision-tier inference.
Escalate. When all of the above fail, the run hits a wall the agent can't navigate alone. Production options: alert a human-in-the-loop to intervene live, or fall back to a browser profile bootstrapped manually for that target. The user-facing experience varies by product; the underlying primitive is "agent gives up cleanly with a useful state dump, not a silent loop."

The order matters. Skip to layer 4 too quickly and you're paying human time on flakiness; skip layers 2–3 and you'll never benefit from the LLM's ability to re-plan.

What makes recovery actually work

Three properties separate working recovery from a chain of bandaids:

The agent has memory of what just failed. Without agent memory, the next iteration loops back into the same broken plan. Including the failure message and the recent action history in the next prompt is what enables productive re-planning.
Backtracking to a known-good state. If the agent dug itself into a modal dialog, an unexpected page, or a partially-completed form, recovery often needs to undo — close the modal, navigate back, refresh. Notte sessions support this through standard navigation primitives plus session-state snapshots.
A step budget that bounds the worst case. max_steps is the safety belt. Without it, a confused agent loops forever; with it, you cap cost and surface "incomplete" cleanly.

Common pitfalls

Retrying the same action with no backoff. Network flakiness becomes a sustained DDoS. Always include a small wait before the retry.
Skipping the verifier. A "completed" status without one just means the agent decided to stop, not that the goal was reached. The verifier catches the run-level failures the action layer misses.
Catching all errors silently. Recovery shouldn't paper over problems; it should retry where retry helps and surface clear failure where it doesn't. Quiet recovery hides the signal you need to fix the underlying flakiness.
No state dump on escalation. When the agent escalates, the human (or the next run) needs to see what failed and where. Production agents capture per-step screenshots and action logs so observability is preserved.

Key takeaways

Recovery is what turns a 70%-success agent into a production-grade one — a layered strategy, not a single fix.
The four layers, cheapest to most expensive: same-action retry, re-planned retry, vision fallback, human/profile escalation.
Order matters: cheap layers first absorb flakiness without paying for re-plans; later layers handle real model errors and structural blockers.
Pair with agent memory for productive re-planning, verifiers for run-level error detection, and observability to debug what went wrong.

How do browser agents recover from errors?