URLs List - Extract ALL website urls

URLs List - Extract ALL website urls

by lofomachines

Automatically discovers and extracts ALL URLs from any website. Perfect for SEO analysis, content inventory, and bulk URL extraction from multiple web...

81 runs
37 users
Try This Actor

Opens on Apify.com

About URLs List - Extract ALL website urls

Automatically discovers and extracts ALL URLs from any website. Perfect for SEO analysis, content inventory, and bulk URL extraction from multiple websites. Get complete URL lists with metadata including last modified dates and priority levels.

What does this actor do?

URLs List - Extract ALL website urls is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

URLs List - Extract ALL Website URLs

Comprehensive URL Extractor for SEO audits, content inventory, and bulk analysis.

FeaturesCost of UsageInputOutputTroubleshooting


This actor automatically discovers and extracts ALL URLs from any target website. It is designed to be the entry point for SEO audits, site migrations, and content analysis pipelines. It crawls recursively to build a complete map of a domain.

✨ Key Features

  • 🔍 Automatic Discovery: Intelligently finds all available URLs from any website structure. * 💨 Fast & Efficient: Optimized for speed to handle large sites (50k+ URLs). * 📦 Bulk Processing: Accepts multiple domain roots to process simultaneously. * 🏷️ Rich Metadata: Extracts last modified dates, priority levels, and update frequency (where available). * 🗜️ Smart Handling: Works with standard sitemaps, recursive crawling, and standard web formats. * 🛡️ Resilient: Automatic retries on temporary errors and infinite loop prevention. * 🎯 Result Limiting: Control the maximum number of URLs extracted with maxResults or enable returnAll for complete extraction. * 🔎 Keyword Filtering: Filter URLs by keywords - only URLs containing all specified keywords will be returned.

    🎯 Use Cases

| Use Case | Description | | :--- | :--- | | SEO Audit | Extract all URLs to analyze site architecture and identify orphan pages. | | Content Inventory | Create a comprehensive list of all existing pages for migration planning. | | Monitoring | Track lastmod dates to identify which content has been updated recently. | | Data Pipelines | Feed the output URLs into other scrapers (e.g., Scrape HTML, Google Sheets export). | | Targeted Extraction | Use keyword filtering to extract only specific sections (e.g., all blog posts, product pages). | | Sampling | Use maxResults to extract a sample of URLs for quick analysis without processing entire sites. | ---

💰 Cost of Usage

This scraper is designed to be lightweight. It parses URL structures without rendering full page JavaScript (unless necessary), keeping costs low. * Small Sites (< 1,000 URLs): Cents per run. * Medium Sites (10,000 URLs): Typically < $1.00. * Large Sites: Efficiency scales well, but usage depends on the complexity of the target site's architecture. > Tip: Always use Apify Proxy (enabled by default) to ensure consistent access and avoid blocking. ---

📥 Input Configuration

The Actor expects a JSON input defining the websites to scan. ### Example Input json { "startUrls": [ { "url": "https://apify.com" }, { "url": "https://crawlee.dev" } ], "proxyConfiguration": { "useApifyProxy": true }, "returnAll": true, "maxResults": 1000, "keywords": ["blog", "article"] } ### Input Parameters | Parameter | Type | Required | Default | Description | | :--- | :--- | :--- | :--- | :--- | | startUrls | Array | ✅ Yes | [{ url: "https://apify.com" }] | List of website URLs to extract pages from. | | proxyConfiguration | Object | ❌ No | { useApifyProxy: false } | Proxy settings for reliable access. | | returnAll | Boolean | ❌ No | true | If true, extracts all available URLs regardless of maxResults. If false, applies the maxResults limit. | | maxResults | Integer | ❌ No | 1000 | Maximum number of URLs to extract. Ignored if returnAll is true or set to 0. | | keywords | Array | ❌ No | [] | Filter URLs to only include those containing ALL specified keywords. Case-insensitive matching. Example: ["blog"] returns only URLs containing "blog" (e.g., https://example.com/blog/article). |

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try URLs List - Extract ALL website urls now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
lofomachines
Pricing
Paid
Total Runs
81
Active Users
37
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support