Structured scraping
Find datasets on Data.gov
Search Data.gov, visit dataset detail pages, and return ranked metadata for public datasets.
Run this template
Clone just this template, configure Notte, and start the run.
Before running
- Have
NOTTE_API_KEYready. Generate an API key.
Need help? Join the Notte Slack.
Extraction path
- Defaults to climate, but accepts any query and result limit from the CLI or .env.
- Uses Notte browser sessions and structured scraping against catalog.data.gov search and dataset detail pages.
- Uses inline uv script metadata, so no template-specific pyproject.toml is required.
- Search query, search URL, result-count text, and sort option.
Query controls
- NOTTE_API_KEY: required by the Python SDK.
- DATA_GOV_QUERY: default query when --query is omitted.
- DATA_GOV_RESULT_LIMIT: default number of ranked results to enrich with detail pages.