Website Recovery Actor

Name: Website Recovery Actor
Author: fiery_dream

by fiery_dream

Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, im...

19 runs

3 users

Try This Actor

Opens on Apify.com

About Website Recovery Actor

Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, images, fonts, and all other assets, then rewrites URLs so everything works locally.

What does this actor do?

Website Recovery Actor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Website Recovery Actor Recover and reverse-engineer website files from a live or deployed site. This actor downloads the complete website including HTML, CSS, JavaScript, images, fonts, and all other assets, then rewrites URLs so everything works locally. Perfect for: - Recovering lost source files from Netlify, Vercel, GitHub Pages, or any deployed site - Creating offline copies of websites - Migrating sites to new hosting - Archiving web content - Analyzing site structure and assets ## Features - Full JS Rendering: Uses Puppeteer to capture JavaScript-rendered content (React, Vue, Next.js, etc.) - Complete Asset Download: Downloads all CSS, JS, images, fonts, videos, and other media - CDN Support: Optionally downloads assets from external CDNs (Google Fonts, Cloudflare, jsDelivr) - URL Rewriting: Converts absolute URLs to relative paths so the site works locally - Structure Preservation: Maintains original URL path structure for easy navigation - CSS Asset Extraction: Parses CSS files to find and download additional assets (fonts, background images) - Inline Style Extraction: Optionally extracts inline `<style>` tags to separate files - Concurrent Downloads: Processes multiple pages and assets in parallel for speed - Proxy Support: Access geo-restricted or blocked sites using Apify Proxy ## How It Works 1. Crawling: The actor visits your start URL and follows links to discover all pages on the same domain 2. Rendering: Each page is fully rendered using a headless Chrome browser to capture dynamic content 3. Asset Extraction: All linked assets (CSS, JS, images, fonts) are identified and queued for download 4. Download: Assets are downloaded in parallel batches 5. URL Rewriting: All URLs in HTML and CSS files are rewritten to use relative local paths 6. Output: Files are saved to Apify's Key-Value Store with a manifest file listing everything ## Input Configuration | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `startUrl` | string | (required) | The URL of the website to recover | | `maxDepth` | integer | 5 | How deep to follow links (0 = only start page) | | `maxPages` | integer | 100 | Maximum number of pages to crawl | | `downloadAssets` | boolean | true | Download CSS, JS, images, fonts, etc. | | `rewriteUrls` | boolean | true | Convert URLs to relative paths | | `includeExternalAssets` | boolean | true | Download assets from CDNs | | `maxConcurrency` | integer | 5 | Parallel page processing | | `downloadTimeout` | integer | 30000 | Asset download timeout (ms) | | `userAgent` | string | Chrome UA | Browser user agent | | `waitForSelector` | string | - | CSS selector to wait for (JS sites) | | `extractInlineStyles` | boolean | false | Extract inline CSS to files | | `preserveStructure` | boolean | true | Maintain URL path structure | | `proxyConfiguration` | object | - | Proxy settings | ## Output The actor outputs all files to the Key-Value Store: - HTML pages with rewritten URLs - CSS files with rewritten asset paths - JavaScript files - Images (PNG, JPG, GIF, WebP, SVG, ICO) - Fonts (WOFF, WOFF2, TTF, OTF) - Videos and audio files - `MANIFEST` - JSON file listing all downloaded files with their original URLs ### Downloading Your Files After the run completes: 1. Go to the Storage tab 2. Click on Key-Value Store 3. Click Export to download all files as a ZIP 4. Extract the ZIP and you have your recovered website! ## Example Usage ### Basic Recovery `json { "startUrl": "https://your-site.netlify.app" }` ### Full Site with All Assets `json { "startUrl": "https://example.com", "maxDepth": 10, "maxPages": 500, "downloadAssets": true, "includeExternalAssets": true, "rewriteUrls": true }` ### JavaScript-Heavy Site (React/Vue/Next.js) `json { "startUrl": "https://my-react-app.vercel.app", "waitForSelector": "#root", "maxDepth": 5, "maxPages": 100 }` ### Single Page Only `json { "startUrl": "https://example.com/specific-page", "maxDepth": 0, "maxPages": 1 }` ## Tips for Best Results 1. For JS-rendered sites: Use `waitForSelector` to ensure content loads before capture 2. Large sites: Increase `maxPages` and `maxConcurrency` 3. Slow sites: Increase `downloadTimeout` 4. Blocked sites: Enable proxy configuration 5. CDN-heavy sites: Keep `includeExternalAssets` enabled ## Limitations - Cannot recover server-side logic (APIs, databases, authentication) - Some minified/bundled JS may be difficult to understand - Dynamic content that requires user interaction won't be captured - Login-protected content requires additional setup ## Technical Details - Built with Apify SDK 3.x and Crawlee - Uses Puppeteer for browser automation - Cheerio for HTML parsing - Respects `robots.txt` (can be disabled) ## Cost Estimate Typical usage costs approximately: - $1-3 per 100 pages with full asset download - Larger sites with many assets may cost more ## License ISC License ## Support For issues or feature requests, please open an issue on the repository.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Website Recovery Actor now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: fiery_dream
Pricing: Paid
Total Runs: 19
Active Users: 3

Related Actors

Web Scraper

by apify

Cheerio Scraper

by apify

Website Content Crawler

by apify

Legacy PhantomJS Crawler

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

Website Recovery Actor

About Website Recovery Actor

What does this actor do?

Key Features

How to Use

Documentation

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?