Super Stealth Scraper
by apricot_blackberry
Sleeper Cell Swarm. Loud scrapers get banned. We use “low & slow” tactics: 50 concurrent browsers that spend 90% of the time loitering like actual hum...
Opens on Apify.com
About Super Stealth Scraper
Sleeper Cell Swarm. Loud scrapers get banned. We use “low & slow” tactics: 50 concurrent browsers that spend 90% of the time loitering like actual humans. Gaussian delays, mouse emulation, WAF evasion & lots more. Don’t hammer the server. Become the traffic.
What does this actor do?
Super Stealth Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
🕵️ Stealth Scraper Template - GOD TIER OPSEC > The most advanced anti-detection web scraper on Apify. Bypass Cloudflare, DataDome, PerimeterX, and enterprise anti-bot systems with military-grade stealth technology.
--- ## 🎯 Why This Scraper? Most scrapers fail because they look like bots. This template is built from the ground up with operational security (OPSEC) principles that make your requests statistically indistinguishable from real human traffic. ### 🔥 The Competition is Amateur Hour | Feature | Basic Scrapers | This Template | |---------|---------------|---------------| | Fingerprint Consistency | ❌ Random per request | ✅ Session-bound | | Timezone/Locale | ❌ Hardcoded or missing | ✅ Dynamic CDP sync | | WebRTC Leak | ❌ Exposes real IP | ✅ Fully patched | | Request Timing | ❌ Uniform random | ✅ Gaussian distribution | | Session Management | ❌ Cookie-only | ✅ Full identity rotation | | Proxy Geo-Sync | ❌ Mismatched | ✅ Real-time alignment | --- ## 🛡️ Stealth Features ### 1. Per-Session Fingerprint Binding Each browser session maintains a consistent hardware fingerprint. Anti-bot systems flag inconsistencies between cookies and browser fingerprints - we eliminate that vector entirely. javascript createSessionFunction: (sessionPool) => { const session = new Session({ sessionPool }); session.userData = { fingerprint: fingerprintGenerator.getFingerprint() }; return session; } ### 2. Chrome DevTools Protocol (CDP) Geo-Sync We query the proxy's actual IP location and surgically override the browser's timezone, locale, and geolocation at the engine level. JavaScript tampering detection cannot catch this. javascript await client.send('Emulation.setTimezoneOverride', { timezoneId: geo.timezone }); await client.send('Emulation.setGeolocationOverride', { latitude: geo.lat, longitude: geo.lon }); ### 3. WebRTC Leak Prevention WebRTC can bypass your proxy and leak your real IP. We mock the RTCPeerConnection API to prevent this attack vector. ### 4. Gaussian Delay Distribution (Box-Muller Transform) Uniform random delays are a bot signature. We use a bell curve distribution that mimics human cognitive processing time. javascript // Most delays cluster around 4.5s, rare outliers at 2s or 8s - just like a real human const delay = getGaussianDelay(4500, 1500, 2000, 10000); ### 5. Aggressive Session Retirement Zero tolerance for burnt sessions. If a captcha or 403 is detected, the session is immediately retired and a fresh identity is rotated in. ### 6. Resource Blocking We abort images, stylesheets, and fonts - saving 400% bandwidth and preventing fingerprinting via render timing. --- ## 📊 Architecture ┌─────────────────────────────────────────────────────────────┐ │ STEALTH SCRAPER │ ├─────────────────────────────────────────────────────────────┤ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Session │ │ Fingerprint │ │ CDP Geo-Sync │ │ │ │ Pool │──│ Generator │──│ (ip-api lookup) │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Playwright + Stealth Plugin │ │ │ │ • WebRTC Mocking • Resource Blocking • Jitter │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Apify Dataset │ │ │ │ (Ready for Vector Embedding) │ │ │ └─────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ VECTOR LOADER (Decoupled) │ │ ┌───────────┐ ┌──────────────┐ ┌────────────────────┐ │ │ │ OpenAI │──│ Pinecone │──│ RAG-Ready Data │ │ │ │ Embeddings│ │ Upsert │ │ │ │ │ └───────────┘ └──────────────┘ └────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ --- ## 🚀 Quick Start ### 1. Clone and Configure bash apify create my-stealth-scraper --template stealth-scraper-template cd my-stealth-scraper ### 2. Customize Your Target Edit src/main.js: - Set your target URL - Configure your extraction selectors - Adjust delays for your target's sensitivity ### 3. Deploy bash apify push ### 4. Run with Residential Proxies (MANDATORY) json { "startUrls": ["https://your-target.com"], "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] } } > ⚠️ Datacenter IPs are dead on arrival. Tier-1 targets (LinkedIn, Glassdoor, Amazon) have AWS/DigitalOcean ranges blacklisted. --- ## 📖 Input Schema | Field | Type | Required | Description | |-------|------|----------|-------------| | startUrls | Array | ✅ | URLs to scrape | | proxyConfiguration | Object | ✅ | Residential proxies required | | maxRequests | Number | ❌ | Maximum pages to scrape (default: 100) | | maxConcurrency | Number | ❌ | Parallel browsers (default: 3) | --- ## 🔧 Customization Guide ### Adding Your Own Extraction Logic javascript async requestHandler({ page, request, log, pushData, session }) { // 1. Block detection is already handled // 2. Add your selectors const data = await page.evaluate(() => { return { title: document.querySelector('h1')?.innerText, content: document.querySelector('.content')?.innerText, // ... your selectors }; }); // 3. Push to dataset await pushData({ ...data, url: request.url, scrapedAt: new Date().toISOString() }); } ### Adjusting Stealth Parameters javascript // For paranoid targets (banks, ticketing) maxErrorScore: 0.3, // Even stricter maxUsageCount: 3, // Kill sessions faster // For relaxed targets maxErrorScore: 1, maxUsageCount: 20, --- ## 💡 Pro Tips ### The Scaling Philosophy > Don't make 1 browser go fast. Make 50 browsers go slow. If you need 10,000 pages: - ❌ 1 browser @ 100 req/min = BLOCKED - ✅ 50 browsers @ 1 req/10s = Looks like 50 users browsing ### Sticky Sessions for Multi-Step Flows Don't rotate IP mid-login. Use session persistence: javascript sessionPoolOptions: { maxPoolSize: 50, persistStateKeyValueStoreId: 'my-sessions' } ### Use the Right Proxy Group - RESIDENTIAL - General purpose stealth - GOOGLE_SERP - Google specifically - Don't mix them. --- ## 📈 Performance | Metric | Value | |--------|-------| | Detection Rate | < 1% | | Average Response Time | 4-8s (by design) | | Memory Usage | ~500MB per browser | | Success Rate on Tier-1 | 95%+ | --- ## 🧪 Tested Against - ✅ Cloudflare - ✅ DataDome - ✅ PerimeterX - ✅ Akamai Bot Manager - ✅ Imperva/Incapsula - ✅ LinkedIn - ✅ Glassdoor - ✅ Indeed - ✅ Amazon --- ## 📦 Output Data is pushed to Apify Dataset in JSON format, ready for: - Vector embedding (use our Vector Loader actor) - Direct API consumption - Export to CSV/Excel json { "title": "Software Engineer", "company": "TechCorp", "location": "San Francisco, CA", "url": "https://target.com/job/123", "source": "target", "scrapedAt": "2024-12-13T20:00:00.000Z" } --- ## 🔗 Related Actors - Vector Loader - Embed scraped data to Pinecone for RAG - LinkedIn Stealth Scraper - Pre-configured for LinkedIn jobs - Glassdoor Stealth Scraper - Pre-configured for Glassdoor --- ## 📄 License ISC License - Use responsibly. Respect robots.txt and terms of service. --- ## 🤝 Support Found a target that beats our stealth? Open an issue - we'll patch it. ---
Built by The Agency
When you absolutely, positively need the data.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Super Stealth Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- apricot_blackberry
- Pricing
- Paid
- Total Runs
- 2
- Active Users
- 2
Related Actors
YouTube Video Transcript
by starvibe
Reddit Scraper
by macrocosmos
Perplexity 2.0
by winbayai
Idealista.com
by lukass
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support