Skip to main content

What is parallel browser execution?

What is parallel browser execution?
Lucas Giordano's avatarBy Lucas Giordano · Co-founder, Notte
Last updated
TL;DR

Parallel browser execution is running many isolated browser sessions concurrently — across processes, containers, or microVMs — to multiply automation throughput. The constraints are almost never your CPU; they're per-session memory (~150–400 MB each), the target site's rate limits, the IP pool you have access to, and the orchestration to detect cross-session coupling. The right concurrency model is workload-shaped, not platform-shaped.

What is parallel browser execution?

Sequential browser automation is fine until it isn't. Running 100 daily report-pulls at one second each takes a hundred seconds; running them sequentially across thousand-step agent workflows takes hours. Parallel execution is the scale-out: spin up many isolated browser sessions, run them concurrently, collect the results. The model is simple. The constraints aren't the model — they're the parts of the system that don't scale linearly with concurrency.

What actually bottlenecks parallel runs

Almost no production parallel-browser deployment hits a CPU ceiling first. The real ceilings, in roughly the order they bind:

  • Per-session memory. A real Chromium session uses 150–400 MB depending on what's loaded. A 16 GB container holds 40–100 sessions before swapping; a serverless microVM is provisioned per-session. Memory is the cheapest constraint to relax — get a bigger box — but it sets the floor.
  • Target site rate limits. Hitting one target with 50 concurrent sessions from 50 IPs gets you flagged. Some targets cap at 5 requests / second / IP regardless of session count. The site, not your platform, decides.
  • IP pool size. Every concurrent session ideally exits through a distinct residential IP. A pool of 500 IPs caps you at 500 concurrent sessions to a single target before the pool starts re-using IPs (which the target sees as one IP making many requests).
  • Cross-session coupling. Sessions sharing the same digital identity, the same vault, or the same scheduled crawl frontier serialize on those resources whether you parallelize the browsers or not.

Hitting the ceiling cleanly requires solving the binding constraint, not the obvious one.

Three parallelism models

Production stacks land on one of three shapes:

  1. Process-level on one box. Several Playwright instances driven from one machine. Cheap, simple, the right answer for tens of concurrent sessions on a friendly target.
  2. Container or microVM per session, autoscaled across hosts. Cloud-browser products run this model by default. Hundreds-to-thousands of sessions, isolated by hardware, billed per-session-second. The right shape for serverless browser workloads.
  3. Distributed worker pool. A queue of tasks, a fleet of workers, each worker running one or a few sessions, results written back to a shared store. The right shape when the workload is too large for any single host to schedule.
Process-levelMicroVM-per-sessionDistributed pool
Concurrency ceilingTens (per host)Hundreds-to-thousandsTens of thousands+
Per-session isolationProcessHardware (microVM)Hardware + network
Setup complexityLowestManaged by the platformHighest (you build the queue)
Best forDev, small batchesMost production agent workWeb-scale crawls

Notte SDK shape

The platform handles autoscaling. From the developer's side, parallelism is just opening multiple sessions concurrently:

main.py
from concurrent.futures import ThreadPoolExecutor
from notte_sdk import NotteClient

client = NotteClient()

def fetch_one(url: str) -> dict:
    with client.Session(proxies=True) as session:
        session.execute(type="goto", url=url)
        return session.observe().metadata.title

urls = [...]  # several hundred URLs

with ThreadPoolExecutor(max_workers=50) as pool:
    titles = list(pool.map(fetch_one, urls))

Each session is an independent cloud browser; the platform spins them up and tears them down on demand. There's no warm-pool to provision and no autoscaling group to configure. For agent runs, the same pattern uses agent.start() and agent.wait() instead of synchronous SDK calls, which lets a single thread orchestrate many parallel agents.

Common pitfalls

  • Adding concurrency to fix latency on a single workflow. If one workflow takes ten steps, parallelism doesn't help — the steps depend on each other. Concurrency speeds up many workflows, not one slow one.
  • Going wider than your IP pool. Many concurrent sessions from one IP look like a single user with super-human reaction time. Get flagged.
  • Forgetting target-side rate limits. The target enforces a limit per IP / per account / per device. Concurrency that ignores this gets you blocked across all sessions.
  • No backpressure on results. Workers finishing faster than the downstream system can ingest leads to memory blow-ups. Bound the pool, bound the queue.

Key takeaways

  • Parallel browser execution scales throughput by running many isolated sessions concurrently; it's the standard answer for batch and high-volume agent work.
  • The bottlenecks are memory, target rate limits, IP pool size, and cross-session coupling — almost never CPU.
  • Three models cover most cases: process-level (small scale), microVM-per-session (most production), distributed pool (web scale).
  • Notte sessions are platform-autoscaled — concurrent.futures over client.Session(...) is the SDK pattern; no warm pools or autoscaling groups to configure.

Build your AI agent on the open web with Notte

Cloud browsers, agent identities, and the Anything API — everything you need to ship reliable browser agents in production.