Sitemap Generator
by igview-owner
Automatically crawl any website and generate XML, HTML, and text sitemaps for SEO optimization. Perfect for submitting to Google Search Console, Bing ...
Opens on Apify.com
About Sitemap Generator
Automatically crawl any website and generate XML, HTML, and text sitemaps for SEO optimization. Perfect for submitting to Google Search Console, Bing Webmaster Tools, and improving search engine indexing. no manual work required. Free sitemap generator tool for WordPress, Blogger, and all website.
What does this actor do?
Sitemap Generator is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
XML Sitemap Generator - Auto Generate SEO Sitemaps for Google Search Console Automatically crawl any website and generate XML, HTML, and text sitemaps for SEO optimization. Perfect for submitting to Google Search Console, Bing Webmaster Tools, and improving search engine indexing. This automated sitemap generator discovers all pages on your website and creates compliant sitemaps in minutes. ## 🚀 Features - Automatic Page Discovery: Intelligently crawls websites following internal links and navigation patterns - Customizable Crawling: Set crawling depth and apply filters to include/exclude specific pages - Multiple Sitemap Formats: - XML (Standard sitemap format for search engines) - HTML (Human-readable sitemap for visitors) - Text (Simple list of URLs) - Built-in Validation: Ensures sitemaps comply with Google and Bing specifications - Image Sitemap Support: Optional inclusion of images following Google's image sitemap extension - Priority & Change Frequency: Automatic priority calculation based on URL depth - Same-Domain Filtering: Only includes pages from the target domain - Pattern-Based Filtering: Include/exclude URLs using regular expressions - Proxy Support: Built-in support for Apify Proxy for reliable crawling ## 📋 Input Configuration ### Required Fields - Start URLs (required): List of URLs where crawling begins. Typically your website's homepage. ### Optional Fields - Max Crawl Depth (default: 3): How many link levels to follow from start URLs - 0 = Only start URLs - 1 = Start URLs + pages they link to - 2 = Two levels deep, etc. - Max Pages Per Crawl (default: 1000): Maximum number of pages to crawl - Include URL Patterns: Regular expressions for URLs to include - Example: ^https://example\.com/blog/.* (only blog pages) - Leave empty to include all same-domain URLs - Exclude URL Patterns: Regular expressions for URLs to exclude - Example patterns: - .*/admin/.* (exclude admin pages) - .*/login.* (exclude login pages) - .*\?.* (exclude URLs with query parameters) - Sitemap Formats (default: xml): Select which formats to generate - xml: Standard XML sitemap - html: Human-readable HTML sitemap - text: Plain text list of URLs - Respect robots.txt (default: true): Follow website's robots.txt rules - Change Frequency (default: weekly): Expected update frequency - Options: always, hourly, daily, weekly, monthly, yearly, never - Default Priority (default: 0.5): Default page priority (0.0 to 1.0) - Homepage automatically gets 1.0 - Priority decreases with page depth - Include Images (default: false): Add images to XML sitemap - Proxy Configuration: Use Apify Proxy for better reliability ## 📤 Output The Actor generates the following outputs: ### Key-Value Store Files 1. sitemap.xml (if XML format selected) - Standard XML sitemap format - Includes loc, lastmod, changefreq, priority - Optional image:image elements 2. sitemap.html (if HTML format selected) - Human-readable sitemap - Styled with CSS for better presentation - Shows priority and change frequency 3. sitemap.txt (if text format selected) - Simple text file with one URL per line - Easy to parse and import ### Dataset Statistics about the crawl: json { "totalUrls": 150, "baseDomain": "example.com", "crawlDepth": 3, "generatedAt": "2025-11-05T10:30:00.000Z", "formats": ["xml", "html"] } ## 🎯 Use Cases 1. SEO Optimization: Submit sitemaps to Google Search Console and Bing Webmaster Tools 2. Website Audits: Discover all pages on a website 3. Migration Planning: Document site structure before migration 4. Content Inventory: Get a complete list of all website pages 5. Quality Assurance: Ensure all important pages are discoverable ## 💡 Examples ### Basic Usage json { "startUrls": [ { "url": "https://example.com" } ], "maxCrawlDepth": 3, "sitemapFormats": ["xml"] } ### Advanced Usage with Filters json { "startUrls": [ { "url": "https://example.com" } ], "maxCrawlDepth": 4, "maxPagesPerCrawl": 5000, "includePatterns": [ "^https://example\\.com/blog/.*", "^https://example\\.com/products/.*" ], "excludePatterns": [ ".*/admin/.*", ".*/login.*", ".*\\?.*" ], "sitemapFormats": ["xml", "html", "text"], "includeImages": true, "changefreq": "daily", "proxyConfiguration": { "useApifyProxy": true } } ### E-commerce Site json { "startUrls": [ { "url": "https://shop.example.com" } ], "maxCrawlDepth": 5, "includePatterns": [ "^https://shop\\.example\\.com/products/.*", "^https://shop\\.example\\.com/categories/.*" ], "excludePatterns": [ ".*/cart.*", ".*/checkout.*", ".*/account.*" ], "sitemapFormats": ["xml"], "includeImages": true } ## 🔧 Technical Details ### XML Sitemap Format The generated XML sitemap follows the sitemaps.org protocol: xml <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>https://example.com/</loc> <lastmod>2025-11-05</lastmod> <changefreq>weekly</changefreq> <priority>1.0</priority> </url> <!-- More URLs --> </urlset> ### Priority Calculation - Homepage: 1.0 - Depth 1 pages: 0.8 - Depth 2 pages: 0.6 - Depth 3+ pages: max(0.3, defaultPriority) ## 📊 Performance - Speed: Crawls 10-50 pages per minute (depending on website speed) - Concurrency: Up to 10 concurrent requests - Memory: Efficient memory usage with streaming - Limits: Can handle websites with 100,000+ pages ## ⚠️ Best Practices 1. Start with Small Depth: Test with maxCrawlDepth=2 first 2. Use Exclude Patterns: Filter out unnecessary pages (login, admin, etc.) 3. Enable Proxy: Use Apify Proxy for better reliability 4. Set Realistic Limits: Don't crawl more pages than needed 5. Respect robots.txt: Keep it enabled unless you have permission 6. Test Patterns: Verify your regex patterns work correctly ## 🐛 Troubleshooting ### Actor Finds Too Few URLs - Increase maxCrawlDepth - Check excludePatterns aren't too restrictive - Verify website has internal links ### Actor Finds Too Many URLs - Decrease maxCrawlDepth - Add more excludePatterns - Reduce maxPagesPerCrawl ### Crawling is Slow - Enable proxy configuration - Check website's response time - Reduce maxConcurrency if website blocks requests ## 📝 License Apache-2.0 ## 🤝 Support For issues, feature requests, or questions, please contact support or create an issue. ## Find ME better XML Sitemaps Generator, Sitemap generator for Blogger, Google Sitemap Generator, Sitemap generator tool, Sitemap Generator WordPress, Visual sitemap generator, Free sitemap generator.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Sitemap Generator now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- igview-owner
- Pricing
- Paid
- Total Runs
- 46
- Active Users
- 10
Related Actors
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Linkedin Profile Details Scraper + EMAIL (No Cookies Required)
by apimaestro
Twitter (X.com) Scraper Unlimited: No Limits
by apidojo
Content Checker
by jakubbalada
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support