Website Contact Data Scraper from Bing and Google

Name: Website Contact Data Scraper from Bing and Google
Author: tuguidragos

by tuguidragos

Scrape verified business contact details from Google and Bing search engine results pages (SERP). Extract emails, phone numbers, websites, and address...

123 runs

18 users

Try This Actor

Opens on Apify.com

About Website Contact Data Scraper from Bing and Google

Scrape verified business contact details from Google and Bing search engine results pages (SERP). Extract emails, phone numbers, websites, and addresses from official company pages. No coding required. Perfect for sales prospecting, market research, and B2B outreach. Export to CSV, JSON via API

What does this actor do?

Website Contact Data Scraper from Bing and Google is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Google & Bing Contact Scraper. Intelligent. Stealthy. Unlimited.

--- ## Key Features ### Universal Multi-Language Support - Works in ANY language - Chinese, Japanese, Korean, Arabic, Russian, etc. - Automatic contact page detection across all languages - Universal URL pattern matching (contact, kontakt, 联系, お問い合わせ, etc.) ### Unlimited Email Extraction - ALL emails found on websites (no limits) - Extracts from visible text, HTML source, and mailto: links - Visits up to 5 contact/impressum pages automatically - Detects obfuscated emails ("email [at] domain.com") - Smart filtering of invalid/image filenames ### 9 Social Media Platforms - Facebook (all pages) - LinkedIn (all profiles) - Instagram (all accounts) - Twitter/X (all profiles) - YouTube (all channels) - TikTok (all profiles) - Pinterest (all boards) - WhatsApp (all contact links) - Telegram (all channels) - Returns ALL links found (no limits) ### Complete Data Extraction - Unlimited emails - every single email found - Unlimited phones - all phone numbers (10-15 digits) - All social media links - complete presence - Physical addresses (US, Canada, Europe, worldwide) - Business hours (any format) - Blog posts/articles (title + URL) - Company name (from page title) ### Anti-CAPTCHA System (Production-Grade) - Playwright-extra with stealth plugin integration - Session pooling (100 concurrent sessions) - Automatic CAPTCHA detection on Google and Bing - Smart proxy rotation (RESIDENTIAL, global coverage) - Complete navigator.webdriver obfuscation - WebGL, canvas, and permissions API patching - Multi-step mouse movements with smooth transitions - 2-4 random scrolls per page with human-like behavior - Thinking delays (4-10 seconds before actions) - 10 automatic retries with exponential backoff - Session retirement on errors with instant replacement ### Performance & Scale (Optimized for 99% Success) - Scales to 200 results per run - Adaptive concurrency (1-8 parallel requests tuned for stealth) - Extended timeouts (3min navigation, 6min handler) - Smart memory management and resource allocation - Early stop when sufficient data found - Graceful error recovery with intelligent retry logic - Success rates: 95-99% on Google, 97-99% on Bing ## Input Configuration ### Simple Configuration json { "searchEngine": "bing", "searchQuery": "plumber toronto canada", "maxResults": 50 } ### Advanced Configuration (v9.1) json { "searchEngine": "bing", "searchQuery": "plumber toronto canada", "maxResults": 50, "googleSerpProvider": "serpapi", "bingSerpProvider": "browser", "serpFallbackToBrowser": true, "serpApiKey": "YOUR_SERPAPI_KEY", "extractionDepth": "moderate", "enableSocialMediaExtraction": true, "enableBusinessHoursExtraction": true, "emailValidation": "basic", "outputFormat": "csv", "webhookUrl": "https://your-server.com/webhook", "webhookMethod": "POST", "debugMode": false, "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"], "apifyProxyCountry": "US" } } ### Parameters: #### Basic Parameters (Required) - searchEngine (string, required, default: "bing"): "google" or "bing" - Google: Best coverage, 95-99% success rate with residential proxies - Bing: More stable, 97-99% success rate (default) - searchQuery (string, required): Any search query in any language - Examples: "dentist new york", "restaurant paris", "弁護士東京" - Length: 2-200 characters - XSS protected with input sanitization - maxResults (integer, required, default: 10): 1-200 results per run - Recommended: 50 max for Google, 100 max for Bing - Larger datasets should be split into multiple runs #### SERP Provider Configuration - googleSerpProvider (string, default: "auto"): How to retrieve Google search results - "auto": Uses SerpAPI whenever a key is set, otherwise scrapes in-browser - "serpapi": Forces SerpAPI for Google (requires serpApiKey) - "browser": Skips APIs entirely and scrapes SERPs directly with Playwright - bingSerpProvider (string, default: "auto"): How to retrieve Bing search results - "auto": Uses SerpAPI whenever a key is set, otherwise scrapes in-browser - "serpapi": Forces SerpAPI for Bing (requires serpApiKey) - "browser": Skips APIs entirely and scrapes SERPs directly with Playwright - serpFallbackToBrowser (boolean, default: true): Automatically switch to browser scraping if SerpAPI fails - Disable this to make the Actor fail-fast instead of scraping in-browser - serpApiKey (string, optional): Your API key from serpapi.com - Required when any provider is set to "serpapi" - Stored as a secret and masked in logs #### Extraction Settings - extractionDepth (string, default: "moderate"): How deep to search for contacts - "basic": Homepage only (fastest) - "moderate": Homepage + 3 contact pages (default) - "deep": Homepage + 5 contact pages (most thorough) - enableSocialMediaExtraction (boolean, default: true): Extract social media profiles - Find Facebook, LinkedIn, Instagram, Twitter, YouTube, TikTok, etc. - Disable for faster processing if social media not needed - enableBusinessHoursExtraction (boolean, default: true): Extract business hours - Find operating hours if available - Disable for faster processing - emailValidation (string, default: "basic"): Email validation strictness - "none": Keep all emails found - "basic": Remove obviously invalid emails (default) - "strict": Only high-quality business emails #### Output Settings - outputFormat (string, default: "json"): Export format - "json": Standard JSON (default) - "csv": CSV with proper escaping - "excel": Excel format (.xlsx workbook) - webhookUrl (string, optional): Send results to webhook - Optional webhook endpoint - Receives POST/PUT with JSON data when scraping completes - webhookMethod (string, default: "POST"): HTTP method for webhook - "POST" (default) or "PUT" #### Advanced Settings - proxyConfiguration (object): Proxy settings - Default: RESIDENTIAL proxies in US (useApifyProxy: true, apifyProxyGroups: ["RESIDENTIAL"], apifyProxyCountry: "US") - Configurable through Apify proxy editor - debugMode (boolean, default: false): Enable detailed logging - Shows detailed extraction strategies and timing - Note: Slower performance when enabled ## Output Format json { "01_companyName": "Example Company Inc.", "02_emails": [ "info@example.com", "sales@example.com", "support@example.com" ], "03_phoneNumbers": [ "+1-416-555-0123", "647-894-7354" ], "04_socialMedia": { "facebook": ["https://facebook.com/example"], "linkedin": ["https://linkedin.com/company/example"], "instagram": ["https://instagram.com/example"], "twitter": ["https://twitter.com/example"], "youtube": ["https://youtube.com/@example"], "tiktok": ["https://tiktok.com/@example"], "pinterest": ["https://pinterest.com/example"], "whatsapp": ["https://wa.me/1234567890"], "telegram": ["https://t.me/example"] }, "05_physicalAddress": "123 Main St, Toronto, ON M5V 2T6", "06_sourceUrl": "https://example.com", "07_businessHours": [ "Mon-Fri: 9am-5pm", "Sat: 10am-4pm" ], "08_additionalInfo": { "position": 1, "searchQuery": "plumber toronto canada", "scrapedAt": "2025-10-05T20:00:00Z", "blogPosts": [ { "title": "How to Fix a Leaky Faucet", "url": "https://example.com/blog/fix-leaky-faucet" } ], "error": null } } ## How It Works ### 1. Search Phase - Queries Google or Bing with search terms - Prefers SerpAPI (when configured) before falling back to direct Playwright scraping - Automatically falls back to the Playwright workflow if APIs are disabled/missing (or when provider is set to "browser") - Extracts all valid website URLs from results - Handles pagination automatically (up to 10 pages for optimal success rate) - Detects CAPTCHA with multi-layer detection (body text, title, iframes) - Auto-retries with new session and proxy on CAPTCHA detection - 8-15 second delays between page navigations for natural behavior ### 2. Extraction Phase For each website found: #### Email Extraction (Advanced) 1. Scans visible page text 2. Searches HTML source code 3. Extracts mailto: links 4. Automatically visits up to 5 contact pages (universal language detection) 5. Detects obfuscated emails (email [at] domain.com) 6. Filters image filenames and invalid patterns 7. Returns ALL valid emails (no 5 email limit) #### Social Media Extraction - Searches all <a href> tags on page - Identifies 9 different platforms - Filters out share buttons and widgets - Returns ALL unique links per platform #### Phone & Address Extraction - Regex patterns for international formats - Validates phone length (10-15 digits) - Multi-format address detection (US, Europe, etc.) - Returns ALL phones found #### Additional Data - Business hours detection (any language/format) - Blog post discovery (up to 5 articles) - Metadata extraction ### 3. Anti-Detection (Advanced Stealth) - Stealth Plugin: Playwright-extra with complete webdriver obfuscation - Session Pooling: 100 rotating sessions with 1-hour timeout - CAPTCHA Detection: Multi-layer detection with instant session retirement - Browser Masking: WebGL, canvas, permissions API fully patched - Human Behavior: - 3x mouse movements per page with 10-30 smooth steps - 2-4 random scrolls with smooth behavior - 4-10 second thinking delays before actions - 8-15 second delays between pages - Smart Retry: 10 attempts with exponential backoff - Global Proxies: Residential proxies across all Apify regions - Adaptive Concurrency: Dynamic concurrent requests depending on extraction depth ## Language Support ### Universal Contact Detection Works automatically in ANY language: - English: contact, about, reach us - German: kontakt, impressum, ansprechpartner - French: contactez, nous contacter - Spanish: contacto, escríbenos - Italian: contatto, contattaci - Romanian: contactare, scrie-ne - Portuguese: fale conosco - Chinese: 联系 (liánxì) - Japanese: お問い合わせ (otoiawase) - Korean: 문의 (mun-ui) - Russian: связаться (svyazat'sya) - Turkish: iletişim - Greek: επικοινωνία - Arabic: اتصل (ittasil) - Hindi: संपर्क (sampark) - Bengali: যোগাযোগ (jogajog) - Vietnamese: liên hệ - Thai: ติดต่อ (tìt-tɔ̀) - Indonesian/Malay: hubungi, kontak - And many more... ### How It Works Uses universal URL patterns and language-independent keywords: javascript // URL patterns /contact/i, /kontakt/i, /联系/i, /お問い合わせ/i, /문의/i, /связаться/i, /contato/i, /liên\s*hệ/i, /(hubungi|kontak)/i, /संपर्क/i, /যোগাযোগ/i, /ติดต่อ/i // Text keywords 'mail', '@', 'phone', 'tel:', 'address', 'info', 'support', 'liên hệ', 'hubungi', 'kontak', 'संपर्क', 'যোগাযোগ', 'ติดต่อ' ## Testing ### Local Testing bash npm install npx apify run export APIFY_TOKEN=your_token_here npx apify run ## Performance Benchmarks ### Success Rates (Version 9.1 with Residential Proxies) | Search Engine | Success Rate | CAPTCHA Evasion | Data Extraction | |---------------|--------------|-----------------|-----------------| | Google | 95-99% | 90-95% | 100% | | Bing | 97-99% | 95-98% | 100% | ### Speed Benchmarks (Version 9.1) | Entity Count | Estimated Time | Memory Usage | Network | |--------------|----------------|--------------|---------| | 5 entities | 1-2 minutes | 300-400MB | 40-100MB | | 50 entities | 8-15 minutes | 350-450MB | 400-1000MB | | 100 entities | 15-30 minutes | 400-500MB | 800-2000MB | | 200 entities | 30-60 minutes | 450-500MB | 1.6-4GB | ### Cost Efficiency (Apify Platform) | Metric | Value | |--------|-------| | Cost per entity | $0.003-0.006 | | Proxy cost | $0.001-0.003 per request | | Compute cost | $0.25 per hour | ## Use Cases ### Lead Generation - B2B contact discovery - Email list building - Sales prospecting - Market research ### Competitive Analysis - Competitor social media presence - Industry contact patterns - Market positioning analysis ### SEO & Marketing - Backlink opportunities - Influencer discovery - Partnership prospecting ## Configuration ### Proxy Settings (Global Coverage) javascript proxyConfiguration: { useApifyProxy: true, apifyProxyGroups: ['RESIDENTIAL'], apifyProxyCountry: 'US' } ### Performance Tuning (v9.1 - Optimized) javascript maxRequestRetries: 10, // Increased from 5 maxConcurrency: 1, // Ultra-conservative for stealth minConcurrency: 1, navigationTimeoutSecs: 180, // 3 minutes requestHandlerTimeoutSecs: 360, // 6 minutes maxRequestsPerCrawl: maxResults * 6 ### Session Pooling (Extended) javascript useSessionPool: true, persistCookiesPerSession: true, sessionPoolOptions: { maxPoolSize: 100, // Increased from 20 sessionOptions: { maxUsageCount: 20, // Increased from 10 maxErrorScore: 1, // Instant retirement on error maxAgeSecs: 3600 // 1-hour timeout }, persistStateKeyValueStoreId: 'session-store' } ### Browser Configuration (Stealth) javascript import { chromium } from 'playwright-extra'; import StealthPlugin from 'puppeteer-extra-plugin-stealth'; chromium.use(StealthPlugin()); launchContext: { launcher: chromium, // Stealth-enabled browser launchOptions: { headless: true, args: [ '--disable-blink-features=AutomationControlled', '--disable-gpu', '--no-sandbox', // ... 20+ additional stealth arguments ] } } ## Troubleshooting ### High CAPTCHA Rate Issue: Getting CAPTCHA on more than 10% of requests Solution: - Version 9.1 includes stealth plugin - should resolve most cases - RESIDENTIAL proxies are auto-configured (US by default) - 100 session pool with automatic rotation - 10 retries with exponential backoff (8-11s delay after CAPTCHA) - If still occurring: reduce maxResults to 50 or split into smaller jobs ### No Results Found Issue: 0 websites extracted Solution: - Try Bing instead of Google - Use more specific search query - Increase maxResults parameter ### Missing Emails Issue: Some websites return no emails Solution: - Actor visits up to 5 contact pages automatically - Some sites hide emails in images (can't be extracted) - Check 08_additionalInfo.error for extraction errors ## Important Notes ### Data Limits - NO LIMIT on emails - returns all found - NO LIMIT on phone numbers - returns all found - NO LIMIT on social media links - returns all per platform - 5 contact pages max visited per website - 5 blog posts max per website ### Rate Limiting & Anti-Detection - Thinking delays: 4-10 seconds before each page load - 3x smooth mouse movements per page (10-30 steps each) - 2-4 random scrolls with 1-3s delays between scrolls - Page navigation delays: 8-15 seconds between pages - Concurrency: 1 request at a time (ultra-conservative) - Session rotation: every 20 requests or on error - Automatic retry with new session and proxy on failure ### Best Practices (Version 9.1) 1. Optimal batch sizes: 50 entities for Google, 100 for Bing 2. Split large jobs: Run 4x50 instead of 1x200 for better success rates 3. Use specific queries: "dentist manhattan" better than "dentist" 4. Monitor success rate: Check logs for CAPTCHA frequency 5. Off-peak hours: Run during 10PM-6AM UTC for lower detection 6. Validate early: Test with 5 results before scaling to 100+ ## What Makes This Actor Special ### Version 9.1 - Production Grade 1. 99% Success Rate - Stealth plugin + 100 session pool + smart retry 2. Truly Universal - Works in ANY language automatically (14+ languages) 3. No Limits - Returns ALL emails, phones, and social links found 4. 9 Social Platforms - Most comprehensive social media extraction available 5. Global Coverage - Residential proxies across all Apify regions 6. Advanced Stealth - Playwright-extra with complete anti-detection 7. Human Behavior - Multi-step mouse moves, random scrolls, realistic delays 8. Battle-Tested - Scales to 200 results with 95-99% success on Google/Bing 9. Cost Efficient - $0.003-0.006 per entity extracted 10. Production Ready - Extended timeouts, smart error handling, graceful recovery ### Technical Highlights (v9.1) - Stealth Plugin: Navigator.webdriver fully masked - Session Pool: 100 concurrent sessions with 1h timeout - Retry Logic: 10 attempts with exponential backoff - Dynamic Concurrency: Adaptive concurrent requests based on extraction depth - Parallel Processing: Contact pages scanned simultaneously with Promise.all - Security: Input validation, API key masking, XSS prevention - SerpAPI Support: Seamlessly switch between SerpAPI and direct browser-based scraping - Export Options: JSON, CSV, Excel, webhook notifications - Email Validation: Configurable validation levels (none/basic/strict) - Detection: Multi-layer CAPTCHA detection with instant session retirement ## What's New in Version 9.1 ### Performance Improvements - 3-8x faster extraction with dynamic concurrency - 60-80% reduced delays across all operations - Parallel contact page scanning with Promise.all - Optimized retry logic with 1.5^n backoff ### User Experience - Redesigned input interface with clear sections and descriptions - Boolean toggles replacing confusing string parameters - Native proxy configuration editor - Extraction depth control for performance tuning - Debug mode for troubleshooting ### Security Enhancements - Comprehensive input validation with length and pattern checks - API key masking in all logs and errors - XSS prevention with input sanitization - Secure webhook implementation ### Data Quality - Email validation levels (none/basic/strict) - Improved phone validation with better patterns - Configurable extraction depth - Social media toggle for focused extraction ### Developer Features - CSV/Excel export with proper formatting - Webhook notifications on completion - Debug mode with detailed logging - Pluggable SERP sources (SerpAPI or direct browser crawling per engine) - Backward compatible with all v9.0 inputs ---

Built with 🩶 for the Apify community 🫡

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Website Contact Data Scraper from Bing and Google now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: tuguidragos
Pricing: Paid
Total Runs: 123
Active Users: 18

Related Actors

🏯 Tweet Scraper V2 - X / Twitter Scraper

by apidojo

Google Search Results Scraper

by apify

Instagram Profile Scraper

by apify

Tweet Scraper|$0.25/1K Tweets | Pay-Per Result | No Rate Limits

by kaitoeasyapi

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support