Data Gov UK Scraper

Data Gov UK Scraper

by parseforge

Automate your UK open data research. This scraper collects structured dataset info from Data.gov.uk for daily updates, analytics, and streamlined workflows.

34 runs
2 users
Try This Actor

Opens on Apify.com

About Data Gov UK Scraper

Need to pull data from the UK's official open data portal, but tired of manual exports and inconsistent formats? I built this scraper because I was in the same spot. It automates the tedious work of collecting dataset details from Data.gov.uk, turning a messy research task into a scheduled, hands-off process. You get clean, structured data on everything from dataset titles and descriptions to publishers and update frequencies, ready to drop into a spreadsheet or database. I use it primarily for two things: keeping a local repository of UK public data automatically updated, and feeding fresh dataset metadata into analytics dashboards. It saves a ton of time if you're in research, policy analysis, or building data-driven applications that rely on current UK government statistics, geospatial data, or transport info. You can set it to run daily, so you're always working with the latest information without having to manually check the portal. The results come out formatted (I typically use JSON or CSV), making integration into your existing workflows straightforward. It’s basically a dedicated assistant for UK open data.

What does this actor do?

Data Gov UK Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Data.gov.uk Scraper

An Apify actor for scraping dataset metadata from the UK government's open data portal, data.gov.uk. It automates the collection of detailed information, supporting both direct URL scraping and search-based discovery.

Overview

This tool extracts structured data from data.gov.uk, eliminating the need for manual research. It's designed for developers, data analysts, and researchers who need to systematically gather UK open data intelligence. You can scrape specific dataset pages directly or use search filters to find relevant datasets based on keywords, publishers, topics, and formats.

Key Features

The scraper collects comprehensive metadata for each dataset, including:

  • Dataset Titles & Descriptions: Full names and detailed summaries.
  • Publisher Information: The originating government department or organization.
  • Temporal Data: Last updated dates.
  • Categorization: Topics (e.g., Business and economy, Transport, Health).
  • Technical Details: Available file formats (CSV, JSON, XML, PDF) and licensing info (primarily Open Government License).
  • Access Links: Direct URLs to dataset pages and download links for the actual data files.
  • Contact Information: Provided enquiry links for datasets.

How to Use

Configure the actor run via input JSON. You can use it in two primary ways.

Option 1: Direct URL Scraping

Provide the exact URLs of the dataset pages you want to scrape.

{
  "startUrl": [
    "https://www.data.gov.uk/dataset/economic-review",
    "https://www.data.gov.uk/dataset/regional-economic-indicators"
  ],
  "maxItems": 10
}

Option 2: Search with Filters

Use search parameters to find and scrape datasets dynamically.

  • searchQuery: Keywords (e.g., "transport", "health").
  • publisher: Specific government department.
  • topic: Category like Transport or Environment.
  • format: File format (CSV, JSON, etc.).
  • oglOnly: Set to true to filter for Open Government License.
  • sort: Order results by "best" match or "recent" updates.
  • maxItems: Limit the number of datasets scraped (required for free users).

Example: Basic Search

{
  "searchQuery": "economics",
  "sort": "best",
  "maxItems": 50
}

Example: Advanced Filtered Search

{
  "searchQuery": "transport",
  "publisher": "Department for Transport",
  "topic": "Transport",
  "format": "CSV",
  "oglOnly": true,
  "sort": "recent",
  "maxItems": 100
}

Input/Output

Input: Configure the scraper using the JSON input schema as shown in the examples above.

Output: The actor stores the scraped dataset metadata in the Apify dataset associated with the run. You can then download this structured data in multiple formats including JSON, CSV, Excel, XML, or HTML for further processing and analysis.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Data Gov UK Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
parseforge
Pricing
Paid
Total Runs
34
Active Users
2
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support