What are scheduled / cron-based browser tasks?

The class of browser-driven work that should run repeatedly is enormous: daily competitor-price snapshots, hourly inventory syncs, weekly compliance scans, monthly metric pulls. None of it should require a human to click a button. Cron is the universal answer to "run this on a schedule" — but cron alone is a half-built system. Real scheduled browser work needs the cron plus a way to spin up a browser cleanly, retry on failure, surface alerts on issues, and capture a per-run audit trail. Scheduled / cron-based browser tasks are the bundled primitive: cron + browser + ops.

What needs to be in the bundle

Five properties separate "I wrote a cron job" from production-grade scheduled browser work:

The schedule itself. A cron expression (0 9 * * 1) or higher-level shorthand (@daily, every 1h). Stored, edited, audited.
A clean browser per run. Each invocation gets its own session — fresh state, fresh fingerprint, no carryover from the previous run unless you explicitly want it (via a profile).
Retry and timeout policy. A run that hangs blocks the next one if you don't bound it. A run that fails on a transient error should retry. Both have to be configured.
Alerting on failure. "The 9 AM scrape silently failed for three weeks" is a real failure mode if no one's watching. Production schedulers ship to PagerDuty / Slack / email on failure; some on success too.
Per-run observability. A scheduled run two weeks ago that produced wrong data should be debuggable. Per-run logs, screenshots, action traces.

You can self-host all five with a beefy cron + a queue + Sentry + screenshots-to-S3, but it's not the work most teams want to own.

The Notte pattern

A scheduled task is a Notte Function plus a schedule. The Function does the work; the schedule decides when it runs:

main.py

# competitor_pricing.py — the Function handler
from notte_sdk import NotteClient
from pydantic import BaseModel

client = NotteClient()

class PriceSnapshot(BaseModel):
    competitor: str
    sku: str
    price_usd: float

def run() -> list[PriceSnapshot]:
    with client.Session(proxies=True) as session:
        agent = client.Agent(session=session, max_steps=20)
        return agent.run(
            task="Visit competitor sites in /tracked-list and capture each SKU's price.",
            response_format=list[PriceSnapshot],
        ).output

Deploy the Function, then attach a schedule (typically via the Notte dashboard or a scheduling integration). Every cron tick spins up a fresh cloud browser, runs the handler, captures the result, and on failure alerts wherever you've routed alerting.

When to reach for scheduling vs. event-driven

Two distinct shapes that get conflated:

Cron-based scheduling — "every Monday at 9 AM." Best for steady-cadence work where the timing isn't event-driven: snapshots, pulls, syncs.
Event-driven invocation — "when a customer signs up, run the KYC flow." A webhook or a queue triggers the function on demand. Different primitive — you'd use the Anything API endpoint shape with an external trigger, not cron.

Many production setups use both: cron for steady-state ingestion, events for user-triggered flows.

Patterns that work in practice

Three architectural patterns most teams converge on:

Single Function per workflow, daily/hourly cron. Simplest. The handler does one thing. Easy to monitor, easy to back-off when one source rate-limits.
Fan-out: scheduler triggers a Function that spawns N parallel sub-functions. When the work is "do this for 500 sources," cron triggers the orchestrator and the orchestrator parallelizes (see parallel browser execution).
Idempotent + checkpointed. A run that crashes halfway should resume, not re-do work. Often combined with durable execution for runs that span hours.

Common pitfalls

No timeout per run. A hung run on Monday is still hanging Tuesday morning, blocking the next scheduled run. Always bound.
Tight cron intervals on rate-limited sources. "Every 5 minutes" against a target with a 12-request/hour limit gets your IP flagged. Match the schedule to the source's tolerance.
No alerting. Silent failures go undetected for weeks. Always wire a notification channel.
No backfill story. When the schedule was off for two days because of a bug, the missing data is just gone. Decide upfront whether failures need to be re-run or accepted.
Same scheduled time as everyone else. Hitting a target at exactly 09:00 with thousands of agents is a self-inflicted DDoS. Jitter the schedule.

Key takeaways

Scheduled / cron-based browser tasks bundle a schedule, a fresh browser per run, retry/timeout policy, alerting, and per-run observability into one primitive.
The Notte shape: deploy a Function, attach a schedule. The platform owns everything between cron tick and result.
Distinguishable from event-driven invocations — cron is for steady-cadence, events for user-triggered flows; production setups commonly use both.
Pitfalls cluster around timing: no timeouts, tight intervals on rate-limited sources, no jitter, no alerting on silent failures.

What are scheduled / cron-based browser tasks?