URLs List - Extract ALL website urls
by lofomachines
Automatically discovers and extracts ALL URLs from any website. Perfect for SEO analysis, content inventory, and bulk URL extraction from multiple web...
Opens on Apify.com
About URLs List - Extract ALL website urls
Automatically discovers and extracts ALL URLs from any website. Perfect for SEO analysis, content inventory, and bulk URL extraction from multiple websites. Get complete URL lists with metadata including last modified dates and priority levels.
What does this actor do?
URLs List - Extract ALL website urls is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
URLs List - Extract ALL Website URLs
Comprehensive URL Extractor for SEO audits, content inventory, and bulk analysis.
Features • Cost of Usage • Input • Output • Troubleshooting
This actor automatically discovers and extracts ALL URLs from any target website. It is designed to be the entry point for SEO audits, site migrations, and content analysis pipelines. It crawls recursively to build a complete map of a domain.
✨ Key Features
- 🔍 Automatic Discovery: Intelligently finds all available URLs from any website structure. * 💨 Fast & Efficient: Optimized for speed to handle large sites (50k+ URLs). * 📦 Bulk Processing: Accepts multiple domain roots to process simultaneously. * 🏷️ Rich Metadata: Extracts last modified dates, priority levels, and update frequency (where available). * 🗜️ Smart Handling: Works with standard sitemaps, recursive crawling, and standard web formats. * 🛡️ Resilient: Automatic retries on temporary errors and infinite loop prevention. * 🎯 Result Limiting: Control the maximum number of URLs extracted with
maxResultsor enablereturnAllfor complete extraction. * 🔎 Keyword Filtering: Filter URLs by keywords - only URLs containing all specified keywords will be returned.
🎯 Use Cases
| Use Case | Description | | :--- | :--- | | SEO Audit | Extract all URLs to analyze site architecture and identify orphan pages. | | Content Inventory | Create a comprehensive list of all existing pages for migration planning. | | Monitoring | Track lastmod dates to identify which content has been updated recently. | | Data Pipelines | Feed the output URLs into other scrapers (e.g., Scrape HTML, Google Sheets export). | | Targeted Extraction | Use keyword filtering to extract only specific sections (e.g., all blog posts, product pages). | | Sampling | Use maxResults to extract a sample of URLs for quick analysis without processing entire sites. | ---
💰 Cost of Usage
This scraper is designed to be lightweight. It parses URL structures without rendering full page JavaScript (unless necessary), keeping costs low. * Small Sites (< 1,000 URLs): Cents per run. * Medium Sites (10,000 URLs): Typically < $1.00. * Large Sites: Efficiency scales well, but usage depends on the complexity of the target site's architecture. > Tip: Always use Apify Proxy (enabled by default) to ensure consistent access and avoid blocking. ---
📥 Input Configuration
The Actor expects a JSON input defining the websites to scan. ### Example Input json { "startUrls": [ { "url": "https://apify.com" }, { "url": "https://crawlee.dev" } ], "proxyConfiguration": { "useApifyProxy": true }, "returnAll": true, "maxResults": 1000, "keywords": ["blog", "article"] } ### Input Parameters | Parameter | Type | Required | Default | Description | | :--- | :--- | :--- | :--- | :--- | | startUrls | Array | ✅ Yes | [{ url: "https://apify.com" }] | List of website URLs to extract pages from. | | proxyConfiguration | Object | ❌ No | { useApifyProxy: false } | Proxy settings for reliable access. | | returnAll | Boolean | ❌ No | true | If true, extracts all available URLs regardless of maxResults. If false, applies the maxResults limit. | | maxResults | Integer | ❌ No | 1000 | Maximum number of URLs to extract. Ignored if returnAll is true or set to 0. | | keywords | Array | ❌ No | [] | Filter URLs to only include those containing ALL specified keywords. Case-insensitive matching. Example: ["blog"] returns only URLs containing "blog" (e.g., https://example.com/blog/article). |
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try URLs List - Extract ALL website urls now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- lofomachines
- Pricing
- Paid
- Total Runs
- 81
- Active Users
- 37
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support