Super Stealth Scraper

by apricot_blackberry

Sleeper Cell Swarm. Loud scrapers get banned. We use “low & slow” tactics: 50 concurrent browsers that spend 90% of the time loitering like actual hum...

2 runs

2 users

Opens on Apify.com

About Super Stealth Scraper

Sleeper Cell Swarm. Loud scrapers get banned. We use “low & slow” tactics: 50 concurrent browsers that spend 90% of the time loitering like actual humans. Gaussian delays, mouse emulation, WAF evasion & lots more. Don’t hammer the server. Become the traffic.

What does this actor do?

Super Stealth Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

🕵️ Stealth Scraper Template - GOD TIER OPSEC > The most advanced anti-detection web scraper on Apify. Bypass Cloudflare, DataDome, PerimeterX, and enterprise anti-bot systems with military-grade stealth technology. --- ## 🎯 Why This Scraper? Most scrapers fail because they look like bots. This template is built from the ground up with operational security (OPSEC) principles that make your requests statistically indistinguishable from real human traffic. ### 🔥 The Competition is Amateur Hour | Feature | Basic Scrapers | This Template | |---------|---------------|---------------| | Fingerprint Consistency | ❌ Random per request | ✅ Session-bound | | Timezone/Locale | ❌ Hardcoded or missing | ✅ Dynamic CDP sync | | WebRTC Leak | ❌ Exposes real IP | ✅ Fully patched | | Request Timing | ❌ Uniform random | ✅ Gaussian distribution | | Session Management | ❌ Cookie-only | ✅ Full identity rotation | | Proxy Geo-Sync | ❌ Mismatched | ✅ Real-time alignment | --- ## 🛡️ Stealth Features ### 1. Per-Session Fingerprint Binding Each browser session maintains a consistent hardware fingerprint. Anti-bot systems flag inconsistencies between cookies and browser fingerprints - we eliminate that vector entirely. `javascript createSessionFunction: (sessionPool) => { const session = new Session({ sessionPool }); session.userData = { fingerprint: fingerprintGenerator.getFingerprint() }; return session; }` ### 2. Chrome DevTools Protocol (CDP) Geo-Sync We query the proxy's actual IP location and surgically override the browser's timezone, locale, and geolocation at the engine level. JavaScript tampering detection cannot catch this. `javascript await client.send('Emulation.setTimezoneOverride', { timezoneId: geo.timezone }); await client.send('Emulation.setGeolocationOverride', { latitude: geo.lat, longitude: geo.lon });` ### 3. WebRTC Leak Prevention WebRTC can bypass your proxy and leak your real IP. We mock the `RTCPeerConnection` API to prevent this attack vector. ### 4. Gaussian Delay Distribution (Box-Muller Transform) Uniform random delays are a bot signature. We use a bell curve distribution that mimics human cognitive processing time. `javascript // Most delays cluster around 4.5s, rare outliers at 2s or 8s - just like a real human const delay = getGaussianDelay(4500, 1500, 2000, 10000);` ### 5. Aggressive Session Retirement Zero tolerance for burnt sessions. If a captcha or 403 is detected, the session is immediately retired and a fresh identity is rotated in. ### 6. Resource Blocking We abort images, stylesheets, and fonts - saving 400% bandwidth and preventing fingerprinting via render timing. --- ## 📊 Architecture ┌─────────────────────────────────────────────────────────────┐ │ STEALTH SCRAPER │ ├─────────────────────────────────────────────────────────────┤ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Session │ │ Fingerprint │ │ CDP Geo-Sync │ │ │ │ Pool │──│ Generator │──│ (ip-api lookup) │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Playwright + Stealth Plugin │ │ │ │ • WebRTC Mocking • Resource Blocking • Jitter │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Apify Dataset │ │ │ │ (Ready for Vector Embedding) │ │ │ └─────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ VECTOR LOADER (Decoupled) │ │ ┌───────────┐ ┌──────────────┐ ┌────────────────────┐ │ │ │ OpenAI │──│ Pinecone │──│ RAG-Ready Data │ │ │ │ Embeddings│ │ Upsert │ │ │ │ │ └───────────┘ └──────────────┘ └────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ --- ## 🚀 Quick Start ### 1. Clone and Configure `bash apify create my-stealth-scraper --template stealth-scraper-template cd my-stealth-scraper` ### 2. Customize Your Target Edit `src/main.js`: - Set your target URL - Configure your extraction selectors - Adjust delays for your target's sensitivity ### 3. Deploy `bash apify push` ### 4. Run with Residential Proxies (MANDATORY) `json { "startUrls": ["https://your-target.com"], "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] } }` > ⚠️ Datacenter IPs are dead on arrival. Tier-1 targets (LinkedIn, Glassdoor, Amazon) have AWS/DigitalOcean ranges blacklisted. --- ## 📖 Input Schema | Field | Type | Required | Description | |-------|------|----------|-------------| | `startUrls` | Array | ✅ | URLs to scrape | | `proxyConfiguration` | Object | ✅ | Residential proxies required | | `maxRequests` | Number | ❌ | Maximum pages to scrape (default: 100) | | `maxConcurrency` | Number | ❌ | Parallel browsers (default: 3) | --- ## 🔧 Customization Guide ### Adding Your Own Extraction Logic `javascript async requestHandler({ page, request, log, pushData, session }) { // 1. Block detection is already handled // 2. Add your selectors const data = await page.evaluate(() => { return { title: document.querySelector('h1')?.innerText, content: document.querySelector('.content')?.innerText, // ... your selectors }; }); // 3. Push to dataset await pushData({ ...data, url: request.url, scrapedAt: new Date().toISOString() }); }` ### Adjusting Stealth Parameters `javascript // For paranoid targets (banks, ticketing) maxErrorScore: 0.3, // Even stricter maxUsageCount: 3, // Kill sessions faster // For relaxed targets maxErrorScore: 1, maxUsageCount: 20,` --- ## 💡 Pro Tips ### The Scaling Philosophy > Don't make 1 browser go fast. Make 50 browsers go slow. If you need 10,000 pages: - ❌ 1 browser @ 100 req/min = BLOCKED - ✅ 50 browsers @ 1 req/10s = Looks like 50 users browsing ### Sticky Sessions for Multi-Step Flows Don't rotate IP mid-login. Use session persistence: `javascript sessionPoolOptions: { maxPoolSize: 50, persistStateKeyValueStoreId: 'my-sessions' }` ### Use the Right Proxy Group - `RESIDENTIAL` - General purpose stealth - `GOOGLE_SERP` - Google specifically - Don't mix them. --- ## 📈 Performance | Metric | Value | |--------|-------| | Detection Rate | < 1% | | Average Response Time | 4-8s (by design) | | Memory Usage | ~500MB per browser | | Success Rate on Tier-1 | 95%+ | --- ## 🧪 Tested Against - ✅ Cloudflare - ✅ DataDome - ✅ PerimeterX - ✅ Akamai Bot Manager - ✅ Imperva/Incapsula - ✅ LinkedIn - ✅ Glassdoor - ✅ Indeed - ✅ Amazon --- ## 📦 Output Data is pushed to Apify Dataset in JSON format, ready for: - Vector embedding (use our Vector Loader actor) - Direct API consumption - Export to CSV/Excel `json { "title": "Software Engineer", "company": "TechCorp", "location": "San Francisco, CA", "url": "https://target.com/job/123", "source": "target", "scrapedAt": "2024-12-13T20:00:00.000Z" }` --- ## 🔗 Related Actors - Vector Loader - Embed scraped data to Pinecone for RAG - LinkedIn Stealth Scraper - Pre-configured for LinkedIn jobs - Glassdoor Stealth Scraper - Pre-configured for Glassdoor --- ## 📄 License ISC License - Use responsibly. Respect robots.txt and terms of service. --- ## 🤝 Support Found a target that beats our stealth? Open an issue - we'll patch it. ---
Built by The Agency
When you absolutely, positively need the data.

Categories

AGENTS INTEGRATIONS AUTOMATION

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Super Stealth Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: apricot_blackberry
Pricing: Paid
Total Runs: 2
Active Users: 2

Related Actors

YouTube Video Transcript

YouTube Video Transcript

by starvibe

Reddit Scraper

by macrocosmos

Perplexity 2.0

by winbayai

Idealista.com

Idealista.com

by lukass

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support