Sitemap Generator

Sitemap Generator

by igview-owner

Automatically crawl any website and generate XML, HTML, and text sitemaps for SEO optimization. Perfect for submitting to Google Search Console, Bing ...

46 runs
10 users
Try This Actor

Opens on Apify.com

About Sitemap Generator

Automatically crawl any website and generate XML, HTML, and text sitemaps for SEO optimization. Perfect for submitting to Google Search Console, Bing Webmaster Tools, and improving search engine indexing. no manual work required. Free sitemap generator tool for WordPress, Blogger, and all website.

What does this actor do?

Sitemap Generator is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

XML Sitemap Generator - Auto Generate SEO Sitemaps for Google Search Console Automatically crawl any website and generate XML, HTML, and text sitemaps for SEO optimization. Perfect for submitting to Google Search Console, Bing Webmaster Tools, and improving search engine indexing. This automated sitemap generator discovers all pages on your website and creates compliant sitemaps in minutes. ## 🚀 Features - Automatic Page Discovery: Intelligently crawls websites following internal links and navigation patterns - Customizable Crawling: Set crawling depth and apply filters to include/exclude specific pages - Multiple Sitemap Formats: - XML (Standard sitemap format for search engines) - HTML (Human-readable sitemap for visitors) - Text (Simple list of URLs) - Built-in Validation: Ensures sitemaps comply with Google and Bing specifications - Image Sitemap Support: Optional inclusion of images following Google's image sitemap extension - Priority & Change Frequency: Automatic priority calculation based on URL depth - Same-Domain Filtering: Only includes pages from the target domain - Pattern-Based Filtering: Include/exclude URLs using regular expressions - Proxy Support: Built-in support for Apify Proxy for reliable crawling ## 📋 Input Configuration ### Required Fields - Start URLs (required): List of URLs where crawling begins. Typically your website's homepage. ### Optional Fields - Max Crawl Depth (default: 3): How many link levels to follow from start URLs - 0 = Only start URLs - 1 = Start URLs + pages they link to - 2 = Two levels deep, etc. - Max Pages Per Crawl (default: 1000): Maximum number of pages to crawl - Include URL Patterns: Regular expressions for URLs to include - Example: ^https://example\.com/blog/.* (only blog pages) - Leave empty to include all same-domain URLs - Exclude URL Patterns: Regular expressions for URLs to exclude - Example patterns: - .*/admin/.* (exclude admin pages) - .*/login.* (exclude login pages) - .*\?.* (exclude URLs with query parameters) - Sitemap Formats (default: xml): Select which formats to generate - xml: Standard XML sitemap - html: Human-readable HTML sitemap - text: Plain text list of URLs - Respect robots.txt (default: true): Follow website's robots.txt rules - Change Frequency (default: weekly): Expected update frequency - Options: always, hourly, daily, weekly, monthly, yearly, never - Default Priority (default: 0.5): Default page priority (0.0 to 1.0) - Homepage automatically gets 1.0 - Priority decreases with page depth - Include Images (default: false): Add images to XML sitemap - Proxy Configuration: Use Apify Proxy for better reliability ## 📤 Output The Actor generates the following outputs: ### Key-Value Store Files 1. sitemap.xml (if XML format selected) - Standard XML sitemap format - Includes loc, lastmod, changefreq, priority - Optional image:image elements 2. sitemap.html (if HTML format selected) - Human-readable sitemap - Styled with CSS for better presentation - Shows priority and change frequency 3. sitemap.txt (if text format selected) - Simple text file with one URL per line - Easy to parse and import ### Dataset Statistics about the crawl: json { "totalUrls": 150, "baseDomain": "example.com", "crawlDepth": 3, "generatedAt": "2025-11-05T10:30:00.000Z", "formats": ["xml", "html"] } ## 🎯 Use Cases 1. SEO Optimization: Submit sitemaps to Google Search Console and Bing Webmaster Tools 2. Website Audits: Discover all pages on a website 3. Migration Planning: Document site structure before migration 4. Content Inventory: Get a complete list of all website pages 5. Quality Assurance: Ensure all important pages are discoverable ## 💡 Examples ### Basic Usage json { "startUrls": [ { "url": "https://example.com" } ], "maxCrawlDepth": 3, "sitemapFormats": ["xml"] } ### Advanced Usage with Filters json { "startUrls": [ { "url": "https://example.com" } ], "maxCrawlDepth": 4, "maxPagesPerCrawl": 5000, "includePatterns": [ "^https://example\\.com/blog/.*", "^https://example\\.com/products/.*" ], "excludePatterns": [ ".*/admin/.*", ".*/login.*", ".*\\?.*" ], "sitemapFormats": ["xml", "html", "text"], "includeImages": true, "changefreq": "daily", "proxyConfiguration": { "useApifyProxy": true } } ### E-commerce Site json { "startUrls": [ { "url": "https://shop.example.com" } ], "maxCrawlDepth": 5, "includePatterns": [ "^https://shop\\.example\\.com/products/.*", "^https://shop\\.example\\.com/categories/.*" ], "excludePatterns": [ ".*/cart.*", ".*/checkout.*", ".*/account.*" ], "sitemapFormats": ["xml"], "includeImages": true } ## 🔧 Technical Details ### XML Sitemap Format The generated XML sitemap follows the sitemaps.org protocol: xml <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://example.com/</loc> <lastmod>2025-11-05</lastmod> <changefreq>weekly</changefreq> <priority>1.0</priority> </url> <!-- More URLs --> </urlset> ### Priority Calculation - Homepage: 1.0 - Depth 1 pages: 0.8 - Depth 2 pages: 0.6 - Depth 3+ pages: max(0.3, defaultPriority) ## 📊 Performance - Speed: Crawls 10-50 pages per minute (depending on website speed) - Concurrency: Up to 10 concurrent requests - Memory: Efficient memory usage with streaming - Limits: Can handle websites with 100,000+ pages ## ⚠️ Best Practices 1. Start with Small Depth: Test with maxCrawlDepth=2 first 2. Use Exclude Patterns: Filter out unnecessary pages (login, admin, etc.) 3. Enable Proxy: Use Apify Proxy for better reliability 4. Set Realistic Limits: Don't crawl more pages than needed 5. Respect robots.txt: Keep it enabled unless you have permission 6. Test Patterns: Verify your regex patterns work correctly ## 🐛 Troubleshooting ### Actor Finds Too Few URLs - Increase maxCrawlDepth - Check excludePatterns aren't too restrictive - Verify website has internal links ### Actor Finds Too Many URLs - Decrease maxCrawlDepth - Add more excludePatterns - Reduce maxPagesPerCrawl ### Crawling is Slow - Enable proxy configuration - Check website's response time - Reduce maxConcurrency if website blocks requests ## 📝 License Apache-2.0 ## 🤝 Support For issues, feature requests, or questions, please contact support or create an issue. ## Find ME better XML Sitemaps Generator, Sitemap generator for Blogger, Google Sitemap Generator, Sitemap generator tool, Sitemap Generator WordPress, Visual sitemap generator, Free sitemap generator.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Sitemap Generator now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
igview-owner
Pricing
Paid
Total Runs
46
Active Users
10
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support