SEO Data Extractor

Name: SEO Data Extractor
Author: nocodeventure

by nocodeventure

Automatically extract SEO metadata, headings, links, and technical data from any website. Perfect for automating audits, competitor research, and content analysis.

33 runs

3 users

Try This Actor

Opens on Apify.com

About SEO Data Extractor

Need to see what's under the hood of any website? I built this actor to pull back the curtain on any page's SEO setup. It fetches all the essentials: page titles, meta descriptions, every single heading (H1 through H6), and all the internal and external links. It also grabs image data, Open Graph tags for social sharing, and Twitter Cards. On the technical side, it collects the core details you'd check in an audit. I run it on Apify all the time, and it spits out everything in clean, structured JSON, which is perfect for piping into your own analysis scripts or spreadsheets. I mainly use it for three things: running quick SEO health checks, seeing what my competitors are doing with their tags and content structure, and gathering data to optimize my own pages. It saves me hours of manual digging. If you're looking to automate the data-collection part of your SEO workflow, this is the actor I'd point you to first.

What does this actor do?

SEO Data Extractor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

SEO Data Extractor

An Apify actor that extracts structured SEO and technical metadata from webpages. It outputs comprehensive JSON data for audits, analysis, and monitoring.

Overview

This tool crawls specified URLs and returns a detailed breakdown of on-page SEO elements, Open Graph tags, Twitter Cards, technical headers, and content structure. It's designed for batch processing, handling errors gracefully, and optionally extracting sitemap data.

Key Features

Meta & Content Data: Extracts title, description, keywords, robots directives, canonical URLs, and heading structure (H1-H6) with counts.
Technical Analysis: Captures HTTP status, response time, charset, language, viewport, and structured data (JSON-LD).
Social Metadata: Parses complete Open Graph and Twitter Card tags.
Asset Audit: Identifies images (with alt text audit), favicons, and brand colors.
Link Analysis: Counts total, internal, and external links.
Sitemap Integration: Can optionally fetch and include all URLs from a domain's sitemap.xml.
Scalable Execution: Built on the Apify platform for reliable, high-volume crawling.

How to Use

Configure the actor via input settings, typically through the Apify console, API, or SDK. The main parameters are:

startUrls: (Array) The list of target URLs. Default is ["https://nocodeventure.com"].
extractSitemapUrls: (Boolean) Set to true to fetch sitemap data for each domain.
maxRequestsPerCrawl: (Integer) Limit total pages scraped (0 for unlimited). Default is 100.
maxConcurrency: (Integer) Control parallel requests (1-50). Default is 10.
proxyConfiguration: (Object) Use Apify Proxy or custom proxies to avoid blocks.

Input & Output

Input Configuration

The actor accepts a JSON input object. Key configuration fields are:

Field	Type	Description	Default
`startUrls`	Array	URLs to extract data from.	`["https://nocodeventure.com"]`
`extractSitemapUrls`	Boolean	Fetch sitemap data for each domain.	`false`
`sitemapUrl`	String	Custom sitemap path.	`/sitemap.xml`
`maxRequestsPerCrawl`	Integer	Max pages to scrape (0 = unlimited).	`100`
`maxConcurrency`	Integer	Parallel requests (1-50).	`10`
`proxyConfiguration`	Object	Proxy settings for anti-blocking.	Apify Proxy disabled

Output Schema

The result is a dataset of items, each a JSON object for a scraped page. The structure includes:

Page Context: url, scrapedAt (ISO timestamp), and optional error/errorMessage fields.
Meta Information: meta object with title, description, keywords, canonical, robots, and their character lengths.
Headings: headings object with combined text and count for H1 through H6 tags.
Content Data: wordCount, linkCount (broken into internal/external), and imageCount (including those missing alt text).
Social & Technical Data: openGraph, twitterCard, structuredData, and technical (status code, response time, viewport, etc.) objects.
Sitemap URLs: sitemapUrls array (present if extractSitemapUrls is enabled).

Example output snippet:

{
  "url": "https://example.com",
  "scrapedAt": "2023-10-26T10:30:00.000Z",
  "meta": {
    "title": "Example Page",
    "titleLength": 12,
    "description": "This is an example.",
    "descriptionLength": 19
  },
  "headings": {
    "h1": { "text": "Main Title", "count": 1 },
    "h2": { "text": "Subtitle One", "count": 1 }
  }
}

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try SEO Data Extractor now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: nocodeventure
Pricing: Paid
Total Runs: 33
Active Users: 3

Related Actors

Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.

by invideoiq

Linkedin Profile Details Scraper + EMAIL (No Cookies Required)

by apimaestro

Twitter (X.com) Scraper Unlimited: No Limits

by apidojo

Content Checker

by jakubbalada

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

SEO Data Extractor

About SEO Data Extractor

What does this actor do?

Key Features

How to Use

Documentation

SEO Data Extractor

Overview

Key Features

How to Use

Input & Output

Input Configuration

Output Schema

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?