Website Recovery Actor
by fiery_dream
Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, im...
Opens on Apify.com
About Website Recovery Actor
Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, images, fonts, and all other assets, then rewrites URLs so everything works locally.
What does this actor do?
Website Recovery Actor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Website Recovery Actor Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, images, fonts, and all other assets, then rewrites URLs so everything works locally. Perfect for: - Recovering lost source files from Netlify, Vercel, GitHub Pages, or any deployed site - Creating offline copies of websites - Migrating sites to new hosting - Archiving web content - Analyzing site structure and assets ## Features - Full JS Rendering: Uses Puppeteer to capture JavaScript-rendered content (React, Vue, Next.js, etc.) - Complete Asset Download: Downloads all CSS, JS, images, fonts, videos, and other media - CDN Support: Optionally downloads assets from external CDNs (Google Fonts, Cloudflare, jsDelivr) - URL Rewriting: Converts absolute URLs to relative paths so the site works locally - Structure Preservation: Maintains original URL path structure for easy navigation - CSS Asset Extraction: Parses CSS files to find and download additional assets (fonts, background images) - Inline Style Extraction: Optionally extracts inline <style> tags to separate files - Concurrent Downloads: Processes multiple pages and assets in parallel for speed - Proxy Support: Access geo-restricted or blocked sites using Apify Proxy ## How It Works 1. Crawling: The actor visits your start URL and follows links to discover all pages on the same domain 2. Rendering: Each page is fully rendered using a headless Chrome browser to capture dynamic content 3. Asset Extraction: All linked assets (CSS, JS, images, fonts) are identified and queued for download 4. Download: Assets are downloaded in parallel batches 5. URL Rewriting: All URLs in HTML and CSS files are rewritten to use relative local paths 6. Output: Files are saved to Apify's Key-Value Store with a manifest file listing everything ## Input Configuration | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | startUrl | string | (required) | The URL of the website to recover | | maxDepth | integer | 5 | How deep to follow links (0 = only start page) | | maxPages | integer | 100 | Maximum number of pages to crawl | | downloadAssets | boolean | true | Download CSS, JS, images, fonts, etc. | | rewriteUrls | boolean | true | Convert URLs to relative paths | | includeExternalAssets | boolean | true | Download assets from CDNs | | maxConcurrency | integer | 5 | Parallel page processing | | downloadTimeout | integer | 30000 | Asset download timeout (ms) | | userAgent | string | Chrome UA | Browser user agent | | waitForSelector | string | - | CSS selector to wait for (JS sites) | | extractInlineStyles | boolean | false | Extract inline CSS to files | | preserveStructure | boolean | true | Maintain URL path structure | | proxyConfiguration | object | - | Proxy settings | ## Output The actor outputs all files to the Key-Value Store: - HTML pages with rewritten URLs - CSS files with rewritten asset paths - JavaScript files - Images (PNG, JPG, GIF, WebP, SVG, ICO) - Fonts (WOFF, WOFF2, TTF, OTF) - Videos and audio files - __MANIFEST__ - JSON file listing all downloaded files with their original URLs ### Downloading Your Files After the run completes: 1. Go to the Storage tab 2. Click on Key-Value Store 3. Click Export to download all files as a ZIP 4. Extract the ZIP and you have your recovered website! ## Example Usage ### Basic Recovery json { "startUrl": "https://your-site.netlify.app" } ### Full Site with All Assets json { "startUrl": "https://example.com", "maxDepth": 10, "maxPages": 500, "downloadAssets": true, "includeExternalAssets": true, "rewriteUrls": true } ### JavaScript-Heavy Site (React/Vue/Next.js) json { "startUrl": "https://my-react-app.vercel.app", "waitForSelector": "#root", "maxDepth": 5, "maxPages": 100 } ### Single Page Only json { "startUrl": "https://example.com/specific-page", "maxDepth": 0, "maxPages": 1 } ## Tips for Best Results 1. For JS-rendered sites: Use waitForSelector to ensure content loads before capture 2. Large sites: Increase maxPages and maxConcurrency 3. Slow sites: Increase downloadTimeout 4. Blocked sites: Enable proxy configuration 5. CDN-heavy sites: Keep includeExternalAssets enabled ## Limitations - Cannot recover server-side logic (APIs, databases, authentication) - Some minified/bundled JS may be difficult to understand - Dynamic content that requires user interaction won't be captured - Login-protected content requires additional setup ## Technical Details - Built with Apify SDK 3.x and Crawlee - Uses Puppeteer for browser automation - Cheerio for HTML parsing - Respects robots.txt (can be disabled) ## Cost Estimate Typical usage costs approximately: - $1-3 per 100 pages with full asset download - Larger sites with many assets may cost more ## License ISC License ## Support For issues or feature requests, please open an issue on the repository.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Website Recovery Actor now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- fiery_dream
- Pricing
- Paid
- Total Runs
- 19
- Active Users
- 3
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support