SEO Data Extractor

SEO Data Extractor

by nocodeventure

Automatically extract SEO metadata, headings, links, and technical data from any website. Perfect for automating audits, competitor research, and content analysis.

33 runs
3 users
Try This Actor

Opens on Apify.com

About SEO Data Extractor

Need to see what's under the hood of any website? I built this actor to pull back the curtain on any page's SEO setup. It fetches all the essentials: page titles, meta descriptions, every single heading (H1 through H6), and all the internal and external links. It also grabs image data, Open Graph tags for social sharing, and Twitter Cards. On the technical side, it collects the core details you'd check in an audit. I run it on Apify all the time, and it spits out everything in clean, structured JSON, which is perfect for piping into your own analysis scripts or spreadsheets. I mainly use it for three things: running quick SEO health checks, seeing what my competitors are doing with their tags and content structure, and gathering data to optimize my own pages. It saves me hours of manual digging. If you're looking to automate the data-collection part of your SEO workflow, this is the actor I'd point you to first.

What does this actor do?

SEO Data Extractor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

SEO Data Extractor

An Apify actor that extracts structured SEO and technical metadata from webpages. It outputs comprehensive JSON data for audits, analysis, and monitoring.

Overview

This tool crawls specified URLs and returns a detailed breakdown of on-page SEO elements, Open Graph tags, Twitter Cards, technical headers, and content structure. It's designed for batch processing, handling errors gracefully, and optionally extracting sitemap data.

Key Features

  • Meta & Content Data: Extracts title, description, keywords, robots directives, canonical URLs, and heading structure (H1-H6) with counts.
  • Technical Analysis: Captures HTTP status, response time, charset, language, viewport, and structured data (JSON-LD).
  • Social Metadata: Parses complete Open Graph and Twitter Card tags.
  • Asset Audit: Identifies images (with alt text audit), favicons, and brand colors.
  • Link Analysis: Counts total, internal, and external links.
  • Sitemap Integration: Can optionally fetch and include all URLs from a domain's sitemap.xml.
  • Scalable Execution: Built on the Apify platform for reliable, high-volume crawling.

How to Use

Configure the actor via input settings, typically through the Apify console, API, or SDK. The main parameters are:

  • startUrls: (Array) The list of target URLs. Default is ["https://nocodeventure.com"].
  • extractSitemapUrls: (Boolean) Set to true to fetch sitemap data for each domain.
  • maxRequestsPerCrawl: (Integer) Limit total pages scraped (0 for unlimited). Default is 100.
  • maxConcurrency: (Integer) Control parallel requests (1-50). Default is 10.
  • proxyConfiguration: (Object) Use Apify Proxy or custom proxies to avoid blocks.

Input & Output

Input Configuration

The actor accepts a JSON input object. Key configuration fields are:

Field Type Description Default
startUrls Array URLs to extract data from. ["https://nocodeventure.com"]
extractSitemapUrls Boolean Fetch sitemap data for each domain. false
sitemapUrl String Custom sitemap path. /sitemap.xml
maxRequestsPerCrawl Integer Max pages to scrape (0 = unlimited). 100
maxConcurrency Integer Parallel requests (1-50). 10
proxyConfiguration Object Proxy settings for anti-blocking. Apify Proxy disabled

Output Schema

The result is a dataset of items, each a JSON object for a scraped page. The structure includes:

  • Page Context: url, scrapedAt (ISO timestamp), and optional error/errorMessage fields.
  • Meta Information: meta object with title, description, keywords, canonical, robots, and their character lengths.
  • Headings: headings object with combined text and count for H1 through H6 tags.
  • Content Data: wordCount, linkCount (broken into internal/external), and imageCount (including those missing alt text).
  • Social & Technical Data: openGraph, twitterCard, structuredData, and technical (status code, response time, viewport, etc.) objects.
  • Sitemap URLs: sitemapUrls array (present if extractSitemapUrls is enabled).

Example output snippet:

{
  "url": "https://example.com",
  "scrapedAt": "2023-10-26T10:30:00.000Z",
  "meta": {
    "title": "Example Page",
    "titleLength": 12,
    "description": "This is an example.",
    "descriptionLength": 19
  },
  "headings": {
    "h1": { "text": "Main Title", "count": 1 },
    "h2": { "text": "Subtitle One", "count": 1 }
  }
}

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try SEO Data Extractor now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
nocodeventure
Pricing
Paid
Total Runs
33
Active Users
3
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support