Get URLs from link
by boring_code
Extracts URLs from a sitemap or webpage with intuitive path matching. Use comma-separated patterns to include or exclude URL paths with smart matching...
Opens on Apify.com
About Get URLs from link
Extracts URLs from a sitemap or webpage with intuitive path matching. Use comma-separated patterns to include or exclude URL paths with smart matching: '/tags/' for exact paths, '/product' for paths starting with, or simple text for substring matches.
What does this actor do?
Get URLs from link is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Get URLs from link This actor extracts URLs from a sitemap or any webpage containing links. It provides intuitive URL path matching and flexible filtering options to get exactly the URLs you need. ## Features - Extract URLs from XML sitemaps or webpages - Smart URL path matching: - Use '/tags/' to match exact path - Use '/product' to match paths starting with /product - Use 'product' to match URLs containing this text anywhere - Exclude specific file extensions (e.g., images) - Exclude URLs using the same smart path matching - Limit the number of processed URLs - Simple comma-separated syntax for filters ## Input Configuration | Field | Type | Description | |-------|------|-------------| | link | String | URL to process (required) | | urlPattern | String | List of URL parts to include (comma separated). Use '*' to include all URLs. When using slashes: '/tags/' matches exact path, '/tags' matches path starting with /tags, 'tags/' matches path ending with tags/. Without slashes (e.g., 'product') matches anywhere in URL | | maxUrls | Integer | Maximum number of URLs to process (0 for no limit). Good for testing purposes | | excludeExtensions | String | List of file extensions to exclude (comma separated). Example: jpg,jpeg,png,gif | | customExcludePattern | String | List of URL parts to exclude (comma separated). Uses same pattern matching as urlPattern. Examples: '/tags/,category' or '/blog/,author' | ## Output The actor outputs a dataset containing URLs that match your specified criteria. Each record has the following field: json { "url": "https://example.com/page" } ## Usage Examples ### Basic Usage Extract all URLs from a sitemap: json { "link": "https://example.com/sitemap.xml" } ### Smart Path Matching Get only product URLs with different matching options: json { "link": "https://example.com/sitemap.xml", "urlPattern": "/products/,productId,deals/" } This will match: - URLs containing exact '/products/' path - URLs containing 'productId' anywhere - URLs ending with 'deals/' ### Exclude File Types and Sections Get URLs excluding images and specific sections: json { "link": "https://example.com/sitemap.xml", "excludeExtensions": "jpg,jpeg,png,gif", "customExcludePattern": "/tags/,/category/,author" } ### Limit Results Get first 100 URLs for testing: ```json { "link": "https://example.com/sitemap.xml", "maxUrls": 100 }
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Get URLs from link now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- boring_code
- Pricing
- Paid
- Total Runs
- 4,407
- Active Users
- 205
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support