Sitemap Generator - Creates sitemap.xml for any domain

Sitemap Generator - Creates sitemap.xml for any domain

by wisteria_banjo

Generate a clean, standards-compliant sitemap.xml for a website. This actor crawls a single website, discovers all indexable pages, and produces: ✅ A...

11 runs
2 users
Try This Actor

Opens on Apify.com

About Sitemap Generator - Creates sitemap.xml for any domain

Generate a clean, standards-compliant sitemap.xml for a website. This actor crawls a single website, discovers all indexable pages, and produces: ✅ A ready-to-submit sitemap.xml (Google-compliant) ✅ A structured JSON dataset of discovered URLs (for auditing, reporting, and billing)

What does this actor do?

Sitemap Generator - Creates sitemap.xml for any domain is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

🗺️ Sitemap Generator (Apify Actor) Generate a clean, standards-compliant sitemap.xml for a website — automatically, reliably, and without manual cleanup. This actor crawls a single website, discovers all indexable pages, and produces: - ✅ A ready-to-submit sitemap.xml (Google-compliant) - ✅ A structured JSON dataset of discovered URLs (for auditing, reporting, and billing) Built for SEO professionals, agencies, and site owners who want accuracy, transparency, and results they can trust. ## ✅ What This Actor Does - Crawls one website per run (no mixed domains, no confusion) - Discovers internal pages by following links - Excludes junk/system URLs automatically (e.g. Cloudflare, admin endpoints) - Respects robots.txt (optional) - Removes duplicate URLs and URL fragments - Optionally strips query strings to prevent sitemap bloat - Extracts real <lastmod> dates when available: - From HTTP Last-Modified headers - From blog/article meta tags when headers are missing - Outputs a fully valid sitemap.xml ## 📦 Outputs (Where to Find Your Files) Run → Storage → Key-value store → sitemap.xml This file is: - Ready to upload to Google Search Console - Ready to host at /sitemap.xml - Standards-compliant (no reconstruction required) ### 🟢 JSON Results (Dataset) Every discovered page is also saved to the Dataset. Each row includes: - url – discovered page URL - depth – crawl depth from the homepage - lastmod – modification date (when available) - lastmodSource"header", "meta", or null This dataset is useful for: - Auditing and QA - URL counts and reporting - Monetization and billing logic - Previewing results before download ## 🔒 Important Design Decisions (On Purpose) ### One Website per Run This actor enforces a single start URL. Why? - A sitemap must not mix domains - One site = one sitemap = one clean result - Prevents invalid or rejected sitemaps - Enables clear pricing per site ### Honest <lastmod> Values The actor does not fake modification dates. - Uses real server headers when available - Falls back to article metadata for blog posts - Omits <lastmod> when no trustworthy source exists This avoids misleading search engines and protects SEO integrity. ## ⚙️ Inputs ### Required - Start URL The root URL of the website (example: https://example.com) ### Optional - Max crawl depth - Max number of pages - Concurrency - Headless browser (for JavaScript-heavy sites) - Strip query strings - Respect robots.txt - Advanced include/exclude URL patterns (regex) Most users can run the actor with just a Start URL. ## 🧠 Who This Is For - SEO professionals - Agencies managing multiple client sites - Developers who need clean sitemaps programmatically - Site owners preparing for Google Search Console - AI-first websites optimizing crawlability ## 💡 Why Use This Actor Instead of Online Sitemap Tools? - No URL limits - No fake results - No mixed domains - No guessing which pages were included - Full transparency (XML + JSON) - Automation-ready and API-friendly ## 🔐 PPE (Paid / Private / Enterprise) This actor is designed for PPE use: - Consistent, auditable outputs - Dataset always populated (even if XML is downloaded) - Clear value per run - Suitable for client-facing and internal workflows Run it. Download sitemap.xml. Submit. Done. ### 🟢 sitemap.xml (Primary Output) Your sitemap is written as a real XML file. Location in Apify UI:

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Sitemap Generator - Creates sitemap.xml for any domain now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
wisteria_banjo
Pricing
Paid
Total Runs
11
Active Users
2
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support