SEO Data Extractor
by nocodeventure
Automatically extract SEO metadata, headings, links, and technical data from any website. Perfect for automating audits, competitor research, and content analysis.
Opens on Apify.com
About SEO Data Extractor
Need to see what's under the hood of any website? I built this actor to pull back the curtain on any page's SEO setup. It fetches all the essentials: page titles, meta descriptions, every single heading (H1 through H6), and all the internal and external links. It also grabs image data, Open Graph tags for social sharing, and Twitter Cards. On the technical side, it collects the core details you'd check in an audit. I run it on Apify all the time, and it spits out everything in clean, structured JSON, which is perfect for piping into your own analysis scripts or spreadsheets. I mainly use it for three things: running quick SEO health checks, seeing what my competitors are doing with their tags and content structure, and gathering data to optimize my own pages. It saves me hours of manual digging. If you're looking to automate the data-collection part of your SEO workflow, this is the actor I'd point you to first.
What does this actor do?
SEO Data Extractor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
SEO Data Extractor
An Apify actor that extracts structured SEO and technical metadata from webpages. It outputs comprehensive JSON data for audits, analysis, and monitoring.
Overview
This tool crawls specified URLs and returns a detailed breakdown of on-page SEO elements, Open Graph tags, Twitter Cards, technical headers, and content structure. It's designed for batch processing, handling errors gracefully, and optionally extracting sitemap data.
Key Features
- Meta & Content Data: Extracts title, description, keywords, robots directives, canonical URLs, and heading structure (H1-H6) with counts.
- Technical Analysis: Captures HTTP status, response time, charset, language, viewport, and structured data (JSON-LD).
- Social Metadata: Parses complete Open Graph and Twitter Card tags.
- Asset Audit: Identifies images (with alt text audit), favicons, and brand colors.
- Link Analysis: Counts total, internal, and external links.
- Sitemap Integration: Can optionally fetch and include all URLs from a domain's
sitemap.xml. - Scalable Execution: Built on the Apify platform for reliable, high-volume crawling.
How to Use
Configure the actor via input settings, typically through the Apify console, API, or SDK. The main parameters are:
startUrls: (Array) The list of target URLs. Default is["https://nocodeventure.com"].extractSitemapUrls: (Boolean) Set totrueto fetch sitemap data for each domain.maxRequestsPerCrawl: (Integer) Limit total pages scraped (0 for unlimited). Default is100.maxConcurrency: (Integer) Control parallel requests (1-50). Default is10.proxyConfiguration: (Object) Use Apify Proxy or custom proxies to avoid blocks.
Input & Output
Input Configuration
The actor accepts a JSON input object. Key configuration fields are:
| Field | Type | Description | Default |
|---|---|---|---|
startUrls |
Array | URLs to extract data from. | ["https://nocodeventure.com"] |
extractSitemapUrls |
Boolean | Fetch sitemap data for each domain. | false |
sitemapUrl |
String | Custom sitemap path. | /sitemap.xml |
maxRequestsPerCrawl |
Integer | Max pages to scrape (0 = unlimited). | 100 |
maxConcurrency |
Integer | Parallel requests (1-50). | 10 |
proxyConfiguration |
Object | Proxy settings for anti-blocking. | Apify Proxy disabled |
Output Schema
The result is a dataset of items, each a JSON object for a scraped page. The structure includes:
- Page Context:
url,scrapedAt(ISO timestamp), and optionalerror/errorMessagefields. - Meta Information:
metaobject withtitle,description,keywords,canonical,robots, and their character lengths. - Headings:
headingsobject with combined text and count for H1 through H6 tags. - Content Data:
wordCount,linkCount(broken into internal/external), andimageCount(including those missing alt text). - Social & Technical Data:
openGraph,twitterCard,structuredData, andtechnical(status code, response time, viewport, etc.) objects. - Sitemap URLs:
sitemapUrlsarray (present ifextractSitemapUrlsis enabled).
Example output snippet:
{
"url": "https://example.com",
"scrapedAt": "2023-10-26T10:30:00.000Z",
"meta": {
"title": "Example Page",
"titleLength": 12,
"description": "This is an example.",
"descriptionLength": 19
},
"headings": {
"h1": { "text": "Main Title", "count": 1 },
"h2": { "text": "Subtitle One", "count": 1 }
}
}
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try SEO Data Extractor now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- nocodeventure
- Pricing
- Paid
- Total Runs
- 33
- Active Users
- 3
Related Actors
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Linkedin Profile Details Scraper + EMAIL (No Cookies Required)
by apimaestro
Twitter (X.com) Scraper Unlimited: No Limits
by apidojo
Content Checker
by jakubbalada
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support