Skip to main content
Templates
Structured scraping

Find datasets on Data.gov

Search Data.gov, visit dataset detail pages, and return ranked metadata for public datasets.

Run this template

Clone just this template, configure Notte, and start the run.

Before running

Extraction path

  • Defaults to climate, but accepts any query and result limit from the CLI or .env.
  • Uses Notte browser sessions and structured scraping against catalog.data.gov search and dataset detail pages.
  • Uses inline uv script metadata, so no template-specific pyproject.toml is required.
  • Search query, search URL, result-count text, and sort option.

Query controls

  • NOTTE_API_KEY: required by the Python SDK.
  • DATA_GOV_QUERY: default query when --query is omitted.
  • DATA_GOV_RESULT_LIMIT: default number of ranked results to enrich with detail pages.