Ai SEO Content Curator

Ai SEO Content Curator

by quaking_pail

The SEO Actor performs a full SEO audit for each URL, extracting key SEO metrics like titles, meta descriptions, and keywords. It also retrieves netwo...

418 runs

87 users

Opens on Apify.com

About Ai SEO Content Curator

The SEO Actor performs a full SEO audit for each URL, extracting key SEO metrics like titles, meta descriptions, and keywords. It also retrieves network information and integrates SEO audit data providing a comprehensive analysis stored in an organized database for further use.

What does this actor do?

Ai SEO Content Curator is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

AI SEO Content Scraper The Selenium SEO Scraper is an Apify actor that uses Selenium and a headless Chrome browser to scrape websites, extract SEO-related data, and store it in a structured format. Users provide starting URLs and optional parameters via an input schema, and the actor outputs detailed metadata, network information, SEO audits, and page content to the default Apify dataset. This documentation explains the input you need to provide and the output you’ll receive. ## Input To run the actor, provide input in JSON format through the Apify console’s “Input” tab or via the API. The input defines the URLs to scrape and controls the scraping scope. ### Input Schema ```json { "title": "Selenium SEO Scraper", "type": "object", "schemaVersion": 1, "properties": { "start_urls": { "title": "Start URLs", "type": "array", "description": "The URLs where scraping begins. Can be a list of strings or objects with a 'url' field.", "prefill": [{"url": "https://example.com"}], "editor": "requestListSources" }, "max_depth": { "title": "Maximum Depth", "type": "integer", "description": "How deep to follow links (0 = only start URLs, 1 = one level of links, etc.).", "default": 1, "minimum": 0 }, "max_urls": { "title": "Max URLs", "type": "integer", "description": "The maximum number of URLs to scrape.", "default": 10, "minimum": 1 }, "search_engine": { "title": "Search Engine", "type": "string", "description": "Optional identifier for future features (e.g., search engine-specific scraping).", "enum": ["Google", "Bing", "DuckDuckGo"], "default": "Google" } }, "required": ["start_urls"] } Input Fields Explained start_urls (required): A list of URLs to start scraping from. Format: Either ["https://example.com"] or [{"url": "https://example.com"}]. Example: [{"url": "https://www.girlsinparis.com/fr/"}]. max_depth (optional, default: 1): Controls how many levels of links to follow. 0: Scrape only the start URLs. 1: Scrape start URLs and their direct links. 2: Include links from those links, and so on. Example: 2. max_urls (optional, default: 10): Limits the total number of URLs scraped. Example: 100. search_engine (optional, default: "Google"): Currently informational; reserved for future enhancements (e.g., search engine-specific behavior). Options: "Google", "Bing", "DuckDuckGo". Example Inputs Basic Example Scrape one URL and its direct links: json { "start_urls": ["https://www.girlsinparis.com/fr/"], "max_depth": 1, "max_urls": 10 } Advanced Example Deeper crawl with multiple URLs: json { "start_urls": [ {"url": "https://www.girlsinparis.com/fr/"}, {"url": "https://example.com"} ], "max_depth": 2, "max_urls": 100, "search_engine": "Google" } How to Provide Input Apify Console: Go to your actor in the Apify console. Open the “Input” tab. Paste your JSON input or use the form (it matches the schema). Save and run the actor. API: Use the Apify API with a POST request to /v2/acts//runs, including your JSON input in the body. Refer to the Apify API Docs for details. Output The actor stores results in the default Apify dataset, which you can access via the console’s “Dataset” tab or API. Each scraped URL generates a JSON object containing metadata, network stats, SEO audit data, and page content. Output Structure json { "url": "https://www.girlsinparis.com/fr/", "info": { "status": "complete", "title": "Girls in Paris - Lingerie & Swimwear", "description": "Explore our collection of lingerie and swimwear designed for comfort and style.", "firstH1": "Welcome to Girls in Paris", "pageSize": 12345, "metaCanonical": "https://www.girlsinparis.com/fr/", "metaLang": "", "metaLanguage": "", "htmlLang": "fr", "wordCount": 150, "linksCount": 20, "linksExternalCount": 5, "linksInternalCount": 15 }, "network": { "Ip": "unavailable", "IpReverse": "unavailable", "pageSizeCompressed": 12345, "fileSize": 12345, "connectTime": 0.5, "loadTime": 1.2, "HttpResponseCode": 200, "HttpContentType": "text/html; charset=UTF-8", "HttpResponse": "Content-Type: text/html; charset=UTF-8, ...", "HttpRequest": "User-Agent: Mozilla/5.0, ..." }, "seoAudit": { "structuredDataPresent": "ok", "titleLength": 30, "titlePresent": "ok", "descriptionLength": 50, "descriptionPresent": "ok", "keywordsPresent": "absent", "h1Count": 1, "h2Count": 3, "headingStructureOk": "ok", "inlineCssCount": 2, "jsFilesCount": 5, "styleFilesCount": 3, "iframeCount": 0, "canonicalPresent": "ok", "htmlLangPresent": "ok", "metaViewportPresent": "ok", "robotsMetaPresent": "ok", "ogTagsPresent": "ok", "twitterTagsPresent": "absent" }, "content": "# Welcome to Girls in Paris\nExplore our collection...", "timestamp": "2025-03-19T06:04:49Z", "search_engine": "Google" } Output Fields Explained url (string): The URL that was scraped. info (object): Metadata and statistics about the page: status: Page load status (e.g., "complete"). title: The page’s title. description: Meta description, if present. firstH1: Text of the first
tag. pageSize: Size of the HTML source in bytes. metaCanonical: Canonical URL from . metaLang, metaLanguage, htmlLang: Language attributes from meta tags or . wordCount: Total words in the page text. linksCount: Total number of tags. linksExternalCount: Number of external links. linksInternalCount: Number of internal links. network (object): HTTP request and response details: Ip, IpReverse: IP address and reverse DNS (currently "unavailable" due to Apify environment limitations). pageSizeCompressed, fileSize: Size of the response content in bytes. connectTime: Time to first byte in seconds. loadTime: Total request time in seconds. HttpResponseCode: HTTP status code (e.g., 200 for success). HttpContentType: MIME type (e.g., "text/html; charset=UTF-8"). HttpResponse: Full response headers as a string. HttpRequest: Full request headers as a string. seoAudit (object): SEO analysis metrics: structuredDataPresent: "ok" if structured data (e.g., schema.org) is found, else "missing". titleLength: Character length of the title. titlePresent: "ok" if a title exists, else "absent". descriptionLength: Character length of the meta description. descriptionPresent: "ok" if a description exists, else "absent". keywordsPresent: "ok" if meta keywords exist, else "absent". h1Count, h2Count: Number of
and
tags. headingStructureOk: "ok" if exactly one
is present, else "problematic". inlineCssCount: Number of elements with inline CSS. jsFilesCount: Number of external