
The Web AI-agent space doesn't have defined layers yet. Some vendors use "full-stack" to mean "we have a UI and an API." Others mean "we do everything." Most mean "we connected an LLM to Playwright." None of that is a stack.
Without defined layers, every implementation is glue code. When something breaks, you can't tell if it's the parsing, the planning, or the execution. You're debugging a black box.
A layer is a conceptual responsibility with a clear input/output contract. It's not necessarily a separate service. When boundaries are clear, you can isolate failure. Without them, you're lost.
The stack exists to eliminate that ambiguity. Here are the five layers that divide the work.

1. Sessions: The Foundation for Everything Else
Before an agent can perceive, decide, or act, it needs somewhere to exist. That's what a session is: a browser instance with state.
Sessions maintain:
- Browser state: cookies, auth tokens
- Identity: logged-in status, auth tokens
- Context: what page you're on, what you've done
Every operation in Notte (agents, scraping, interactions) runs inside a session. Without sessions, nothing else works. You can't perceive a page that doesn't exist. You can't execute actions without a browser to run them in.
Sessions are managed browser instances with timeout controls. They're the runtime environment where all other layers operate.
2. Perception: Defining What Exists
Perception converts browser noise into structure: a list of elements the agent can interact with.
On a flight booking site, Perception outputs a semantic map with stable action IDs:
- Search form: origin, destination, date
- Button: search_flights
- Filters: price_low_to_high, duration
- Results list with clickable items
Not raw DOM but a clean interface defining what exists so the next layer can decide what to do.
3. Cognition: Decide What to Do Next
Input: The Scene from Perception + your goal
Output: A plan using those action IDs
Cognition takes "Book the cheapest flight to NYC for next Friday" and turns it into executable steps. It's constrained: it can only reference action IDs that exist in the Scene. It can't hallucinate buttons.
When a plan breaks, you know it's not because Cognition invented a fake button. The problem is either:
- Perception missed something (bad Scene)
- Execution failed (couldn't click it)
- The page changed (Scene is stale)
Why Separate Perception and Cognition?
Caching. Same page = same Scene. Parse once, run multiple plans against it.
Constraints. Cognition can't hallucinate actions Perception didn't find.
Different failure modes. Separate layers let you pinpoint the fault.
Swappable implementations. Use DOM parsing now, switch to vision-language models later. Cognition doesn't change. It still gets a Scene and outputs a Plan.
4. Execution: Do It and Check It Worked
Execution runs the plan. For each step:
- Perform the action (click, fill, submit)
- Wait for the page to update
- Observe the new state
- Verify the expected outcome occurred
- If not, retry. If it still fails, return control to Cognition to re-plan
This feedback loop separates action from blind clicking. The agent verifies cause and effect.
Most use Playwright now. In the future, browsers might provide agent-aware APIs like browser.agent.submitForm() designed for automation that doesn't break when the DOM changes.
5. Memory: Don't Start from Zero Every Time
Agents don't learn like models do (no weight updates). They remember through stored state:
- Sessions: Login tokens, cookies, auth state
- Successful paths: Reuse flows that worked
- Artifacts: CSVs downloaded, PDFs created
- Traces: Full history for debugging
Here's why this matters: Your agent tries to book a flight on Expedia. The first time, it explores the page, finds filters, and discovers that sort_price_asc sorts by cheapest first. Memory stores that successful path. Next time, before Cognition even plans, it sees: "On Expedia, cheapest flights = action ID sort_price_asc." The agent skips the discovery phase entirely. Fewer LLM calls, faster execution, lower cost.
The synergy is simple: Memory narrows the search space, Cognition operates on that smaller space. Same outcome with lower token cost.
Five layers defined. Now the hard part: making them actually work together.
Why Integration Matters
A stack only works if contracts between layers align. The output of one must be the input of the next. Otherwise, you're debugging glue code between mismatched systems.
You could build each layer yourself using different tools, but now you're maintaining the glue. When something breaks across boundaries, whose fault is it?
Two concerns get conflated: custody and operations.
- Custody is who controls your credentials, sessions, and data
- Operations is who runs the browser infrastructure
They don't have to be the same. Ideal setup: you keep custody (control of sessions, data, replays), but offload operations (browser management, scaling).
Reliability comes from aligned contracts. Outputs matching inputs, trace IDs flowing end-to-end, errors handled across boundaries. That's why getting all five layers from one coherent stack matters. Not necessarily one vendor, but one design philosophy. Someone has to make the pieces fit.
You can't bolt on integration after the fact:
- Add verification later? Rewrite how actions report state
- Add memory later? Rewire how traces and artifacts flow
- Add sessions later? Redesign how agents persist and resume
A real stack is designed together. Boundaries are enforced, contracts are clear, and every layer speaks the same language.
Platform Infrastructure vs. Agent Capability Infrastructure
After layers align, you need infrastructure underneath them. There are two kinds:
Platform infrastructure: Docker, Kubernetes, deployment pipelines. Commoditised. Everyone has it.
Agent capability infrastructure: The five layers above. Not commoditised. This is the hard part…making browser agents run reliably in production.
"Full-stack" in this context means agent capability infrastructure. Not a UI and an API, but standardised layers that make the system debuggable and production-ready.
What's Next
We're building toward that standard at Notte. If you want to see a full-stack agent run end-to-end, try the console at notte.cc.
