What are scheduled / cron-based browser tasks?

Scheduled browser tasks are automation runs triggered by a cron expression — 'every Monday at 9 AM, scrape competitor pricing,' 'every hour, sync inventory from the supplier portal.' The platform owns the schedule, the warm-up, the retries, and the alerting. You supply the workflow.
What are scheduled / cron-based browser tasks?
The class of browser-driven work that should run repeatedly is enormous: daily competitor-price snapshots, hourly inventory syncs, weekly compliance scans, monthly metric pulls. None of it should require a human to click a button. Cron is the universal answer to "run this on a schedule" — but cron alone is a half-built system. Real scheduled browser work needs the cron plus a way to spin up a browser cleanly, retry on failure, surface alerts on issues, and capture a per-run audit trail. Scheduled / cron-based browser tasks are the bundled primitive: cron + browser + ops.
What needs to be in the bundle
Five properties separate "I wrote a cron job" from production-grade scheduled browser work:
- The schedule itself. A cron expression (
0 9 * * 1) or higher-level shorthand (@daily,every 1h). Stored, edited, audited. - A clean browser per run. Each invocation gets its own session — fresh state, fresh fingerprint, no carryover from the previous run unless you explicitly want it (via a profile).
- Retry and timeout policy. A run that hangs blocks the next one if you don't bound it. A run that fails on a transient error should retry. Both have to be configured.
- Alerting on failure. "The 9 AM scrape silently failed for three weeks" is a real failure mode if no one's watching. Production schedulers ship to PagerDuty / Slack / email on failure; some on success too.
- Per-run observability. A scheduled run two weeks ago that produced wrong data should be debuggable. Per-run logs, screenshots, action traces.
You can self-host all five with a beefy cron + a queue + Sentry + screenshots-to-S3, but it's not the work most teams want to own.
The Notte pattern
A scheduled task is a Notte Function plus a schedule. The Function does the work; the schedule decides when it runs:
# competitor_pricing.py — the Function handler
from notte_sdk import NotteClient
from pydantic import BaseModel
client = NotteClient()
class PriceSnapshot(BaseModel):
competitor: str
sku: str
price_usd: float
def run() -> list[PriceSnapshot]:
with client.Session(proxies=True) as session:
agent = client.Agent(session=session, max_steps=20)
return agent.run(
task="Visit competitor sites in /tracked-list and capture each SKU's price.",
response_format=list[PriceSnapshot],
).outputDeploy the Function, then attach a schedule (typically via the Notte dashboard or a scheduling integration). Every cron tick spins up a fresh cloud browser, runs the handler, captures the result, and on failure alerts wherever you've routed alerting.
When to reach for scheduling vs. event-driven
Two distinct shapes that get conflated:
- Cron-based scheduling — "every Monday at 9 AM." Best for steady-cadence work where the timing isn't event-driven: snapshots, pulls, syncs.
- Event-driven invocation — "when a customer signs up, run the KYC flow." A webhook or a queue triggers the function on demand. Different primitive — you'd use the Anything API endpoint shape with an external trigger, not cron.
Many production setups use both: cron for steady-state ingestion, events for user-triggered flows.
Patterns that work in practice
Three architectural patterns most teams converge on:
- Single Function per workflow, daily/hourly cron. Simplest. The handler does one thing. Easy to monitor, easy to back-off when one source rate-limits.
- Fan-out: scheduler triggers a Function that spawns N parallel sub-functions. When the work is "do this for 500 sources," cron triggers the orchestrator and the orchestrator parallelizes (see parallel browser execution).
- Idempotent + checkpointed. A run that crashes halfway should resume, not re-do work. Often combined with durable execution for runs that span hours.
Common pitfalls
- No timeout per run. A hung run on Monday is still hanging Tuesday morning, blocking the next scheduled run. Always bound.
- Tight cron intervals on rate-limited sources. "Every 5 minutes" against a target with a 12-request/hour limit gets your IP flagged. Match the schedule to the source's tolerance.
- No alerting. Silent failures go undetected for weeks. Always wire a notification channel.
- No backfill story. When the schedule was off for two days because of a bug, the missing data is just gone. Decide upfront whether failures need to be re-run or accepted.
- Same scheduled time as everyone else. Hitting a target at exactly 09:00 with thousands of agents is a self-inflicted DDoS. Jitter the schedule.
Key takeaways
- Scheduled / cron-based browser tasks bundle a schedule, a fresh browser per run, retry/timeout policy, alerting, and per-run observability into one primitive.
- The Notte shape: deploy a Function, attach a schedule. The platform owns everything between cron tick and result.
- Distinguishable from event-driven invocations — cron is for steady-cadence, events for user-triggered flows; production setups commonly use both.
- Pitfalls cluster around timing: no timeouts, tight intervals on rate-limited sources, no jitter, no alerting on silent failures.