Skip to main content

What is browser fingerprinting?

What is browser fingerprinting?
Lucas Giordano's avatarBy Lucas Giordano · Co-founder, Notte
Last updated
TL;DR

Browser fingerprinting is the technique of identifying a browser by combining dozens of small signals — user-agent string, screen size, fonts, canvas-rendering hash, GPU, timezone, audio-context output, JavaScript quirks — into a near-unique identifier. Anti-bot systems use it to flag automation; ad-tech uses it to track users across sessions. The signals leak whether you opt in or not, and a default headless Chromium gets fingerprinted as a bot in seconds.

What is browser fingerprinting?

Every browser leaks a constellation of small, individually-uninteresting signals — fonts available, screen size, the hash of a tiny rendered canvas, the timing of a hardware clock, the way a specific JavaScript API responds. None of those is identifying on its own. Combined, they're enough to recognize the same browser across sessions with surprising accuracy. That combination is the fingerprint, and it's the primary signal anti-bot systems use to decide whether the visitor is a real user or automation. The whole field of stealth automation is reverse-engineering what gets fingerprinted and how to avoid sticking out.

What gets measured

A modern fingerprinting library (FingerprintJS, the proprietary versions inside Cloudflare / Akamai / Datadome) collects 50–200 signals. The most significant ones cluster into a few categories:

  • Static identifiers. User-agent string, OS, screen resolution, color depth, hardware concurrency, device memory, language, timezone.
  • Rendering fingerprints. A canvas element rendered with specific text and shapes hashes differently across GPUs and font configurations. Same for WebGL, audio context, video.
  • Font enumeration. Which fonts the browser can render. Different OSes, browsers, and locales install different sets.
  • JavaScript environment quirks. Properties of navigator, chrome (presence/absence), webdriver indicators, prototype chains. Headless Chromium leaves dozens of these visible.
  • Network / TLS fingerprints. TLS handshake signature (JA3/JA4), HTTP/2 frame ordering. Browsers and automation tools hand-shake differently from each other.
  • Behavioral signals. Mouse movement entropy, scroll cadence, key-press timing. Real users move; bots don't.

The combination of these is what's stored. Two visits with even slightly different signals can be matched as the same actor by similarity, not exact equality.

Why default automation gets caught

Headless Chromium leaks bot-tells almost everywhere:

  • navigator.webdriver is true.
  • The user-agent string contains HeadlessChrome unless overridden.
  • The canvas rendering quirks of headless mode differ subtly from headful.
  • Default fonts may not match what a real OS install would have.
  • Mouse and keyboard events are programmatically generated with super-human cadence.
  • Screen size, GPU info, timezone may not match the proxy's geographic exit point.

Modern detection scores these signals together. Default Playwright with no stealth lands in the suspicious bucket on most major sites within the first request. Vanilla Puppeteer fares worse. The whole reason platforms ship "stealth mode" or "anti-detection" is to align all of these signals to look like a real, plausible user.

How stealth actually works

Stealth isn't a single trick — it's coherent alignment of every signal. The work is in three layers:

  • Mask the obvious. Hide navigator.webdriver, override the user-agent, install a real font set, fix the canvas-rendering quirks of headless mode.
  • Internally consistent profile. A timezone of America/New_York should match an IP exit in the US, a Accept-Language: en-US, a screen resolution common on US laptops, fonts typical for macOS or Windows. Inconsistencies are the strongest detection signal.
  • Behavioral plausibility. Mouse movements with real-world entropy, scroll events with realistic acceleration, type-into-input pauses that look human. Some stealth platforms ship pre-built behavioral models.

Notte's session layer aligns these by default — the surface is exposed as client.Session(proxies=True, ...) and the platform handles the fingerprint coherence behind the scenes. The honest take: no stealth holds against every detection, every quarter, on every site. It's an arms race; the goal is staying ahead of the average detection threshold for the targets you care about.

Common pitfalls

  • Spoofing the user-agent without changing other signals. Saying you're Safari while using a Chromium-only API is a high-signal flag.
  • Datacenter IP + residential fingerprint. The IP and the fingerprint should tell the same story. Mismatched stories are easier to detect than either signal alone.
  • Reusing one fingerprint across many sessions. A real user's browser is stable; a bot fleet using one identical fingerprint across thousands of IPs is suspicious. Vary the surface, but coherently.
  • Treating headless as the bot signal. It used to be — these days, modern anti-detection makes headless invisible. The signal that remains is incoherence between layers.

Key takeaways

  • Browser fingerprinting combines dozens of signals (user-agent, canvas, fonts, GPU, timezone, behavioral cadence, TLS) into a near-unique identifier used for bot detection and user tracking.
  • A default headless Chromium gets flagged on most major sites within the first request — every layer of automation has to be aligned, not just the user-agent.
  • Stealth is signal coherence: a fingerprint that makes consistent claims about its own provenance, IP, hardware, and behavior.
  • Pair stealth fingerprinting with residential proxies and persistent digital identities for a coherent multi-layer story.

Build your AI agent on the open web with Notte

Cloud browsers, agent identities, and the Anything API — everything you need to ship reliable browser agents in production.