Business Contact Extractor

by washed_fun

Extract business emails and phone numbers from company websites. High-accuracy AI-enabled business contact extractor using smart crawling, LLM and AI ...

54 runs

16 users

Try This Actor

Opens on Apify.com

About Business Contact Extractor

Extract business emails and phone numbers from company websites. High-accuracy AI-enabled business contact extractor using smart crawling, LLM and AI extraction, heuristics, and PDF extraction. Finds emails/phones even on complex sites. Supports CSV bulk input and outputs clean, CRM-ready data.

What does this actor do?

Business Contact Extractor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Business Contact Extractor The most thorough and accurate contact extractor available — powered by AI and built with enterprise-grade verification. Extract verified business emails and phone numbers from company websites, even when contact info is buried in PDFs, hidden behind navigation menus, or scattered across multiple pages. This Actor combines smart multi-page crawling, strict multi-layer validation, and optional AI-powered extraction to deliver results that other scrapers simply miss. ### Why AI Makes the Difference When you provide a Gemini API key, this Actor uses LLM-powered extraction to: - Understand page context — AI reads the page like a human, identifying contact sections even on modern, JavaScript-heavy sites with minimal visible text - Extract from complex layouts — Finds emails and phones embedded in stylized designs, image-based text, or unconventional formatting - Validate intelligently — Cross-references extracted contacts against the page content to reject false positives - Achieve 100% email accuracy — In real-world benchmarks, LLM mode eliminated all email errors Even without AI, this Actor runs 50+ validation rules on every email and phone number — rejecting hex IDs, placeholder text, vendor emails, invalid formats, and other junk that pollutes typical scraper output. Ideal for lead generation, B2B prospecting, data enrichment, and CRM automation. Works as a business email scraper, website contact finder, and phone number extraction API. --- ## Pricing This Actor uses Apify's pay-per-event model: | Fee Type | Cost | |----------|------| | Actor Start | $0.05 per run | | Result | $0.006 per domain | Example: Processing 100 domains costs approximately $0.65 ($0.05 start fee + $0.60 for 100 results). ### LLM Cost (Optional) The optional LLM feature uses Gemini 2.0 Flash, which has an extremely generous free tier. For most users, the LLM cost is completely free or negligible — typically just a few cents even for thousands of domains. Get a free Gemini API key at https://aistudio.google.com/ --- ## Why This Scraper Is Different Most contact scrapers only scan a single page or rely on simple pattern matching. That misses a huge amount of real business contact information. This Actor is designed to be far more thorough and reliable, using a hybrid system that dramatically improves both coverage and accuracy: ### 🔍 Smart Multi-Page Crawling Automatically looks for: * Contact pages * About/team/support pages * Footer links * Auto-discovered subpages This avoids the “homepage only” limitation of basic scrapers. ### 🧠 AI-Powered Extraction + Strict Verification Every contact goes through a multi-stage validation pipeline: * Format validation — Rejects malformed emails, hex IDs, UUID fragments, and datetime strings * Domain matching — Prioritizes emails matching the company's own domain * Vendor filtering — Excludes generic vendor emails (e.g., orders@toasttab.com) * Phone normalization — Converts all numbers to E.164 international format * Duplicate detection — Removes redundant entries across all pages With the optional Gemini AI integration, the Actor can also: * Parse JavaScript-rendered content that basic scrapers miss * Understand semantic context to find contacts on unconventional page layouts * Cross-validate LLM findings against strict rules to eliminate hallucinations ### 📄 PDF Contact Extraction Many companies hide contact details inside: * brochures * catalogs * downloadable spec sheets This Actor automatically fetches and scans PDFs for emails and phone numbers — a major upgrade over typical HTML-only scrapers. ### 🧹 Enterprise-Grade Data Quality This isn't just extraction — it's verification at scale. Every result passes through 50+ validation rules: * Rejects placeholder emails (test@, noreply@, example@) * Filters out vendor/third-party emails (toasttab, squarespace, wix) * Removes invalid phone patterns (hex IDs, tracking codes, dates) * Normalizes all phones to E.164 international format * Deduplicates across all crawled pages * Prioritizes brand-matching emails as the primary contact The result: CRM-ready data you can trust, not a list of garbage to clean up manually. ### 📦 Bulk CSV Upload Upload a CSV of domains and process hundreds of websites in one run. --- ## Performance (Based on Real-World Benchmarking) Testing with 30 trade show exhibitor domains: | Metric | Without LLM | With LLM | | -------------- | ----------- | -------- | | Email accuracy | 92% | 100% | | Email coverage | 80% | 83% | | Phone coverage | 87% | 90% | These results are significantly higher than traditional scrapers. --- ## Performance Guarantees & Crawling Limits To ensure fast and reliable performance on the Apify platform, this Actor enforces strict limits: | Limit | Value | Reason | |-------|-------|--------| | Max pages per domain | 3 | Ensures fast completion and high health score | | Per-domain timeout | 6 seconds | Prevents slow sites from blocking the queue | | Global run timeout | 5 minutes | Ensures runs always complete within Apify limits | Important notes: - User requests for deeper crawling (>3 pages) are automatically capped for stability - PDF extraction is disabled by default because it significantly increases run times. Enable it via `enablePdfExtraction: true` if needed. - Each domain is time-limited to ~6 seconds, ensuring the Actor can process large batches efficiently Recommended batch sizes: 50–250 domains per run for optimal performance. This is not a full-site spider. The Actor is optimized for fast, targeted contact extraction from the homepage and key contact pages — not deep crawling entire websites. --- ## Input Options ### Option 1 — Domain List `json { "domains": ["example.com", "another-company.com"], "maxPagesPerDomain": 3, "llmApiKey": "your-gemini-api-key" }` ### Option 2 — CSV Upload Upload a CSV with a `domain`, `website`, or `url` column: `domain example.com another-company.com acme-corp.net` --- ## Parameters | Parameter | Type | Description | | ------------------- | ------- | -------------------------------------------------------------- | | domains | array | List of domains or URLs to scrape | | csvFile | file | CSV file with domain/website/url column | | maxPagesPerDomain | integer | Max pages to crawl per domain (default: 3, max: 3) | | enablePdfExtraction | boolean | Enable PDF extraction (default: false, slower but more thorough) | | llmApiKey | string | Gemini API key for enhanced accuracy | Get a free Gemini API key at https://aistudio.google.com/ --- ## Output Format Each domain produces one result: `json { "domain": "example.com", "primary_email": "contact@example.com", "primary_phone": "+14155551234", "supplemental_emails": ["sales@example.com", "support@example.com"], "supplemental_phones": ["+14155555678"] }` ### Output Fields | Field | Description | | ------------------- | --------------------------------------- | | domain | Domain that was scraped | | primary_email | Best email found (prefers brand domain) | | primary_phone | Best phone found (E.164 format) | | supplemental_emails | All other valid emails found | | supplemental_phones | All other valid phones found | --- ## Usage Tips * Add a Gemini API key for maximum coverage and 100% email accuracy * Use CSV upload for large batches (50–250 domains recommended) * Enable `enablePdfExtraction` if contact info is often in PDF brochures/catalogs * Find results in the Dataset tab after the run completes --- ## Limitations * Cannot extract contacts behind login walls * Cannot retrieve contacts locked behind form submissions --- If you need high-quality business contact data at scale, this Actor provides the most robust and accurate extraction method available on Apify.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Business Contact Extractor now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: washed_fun
Pricing: Paid
Total Runs: 54
Active Users: 16

Related Actors

🏯 Tweet Scraper V2 - X / Twitter Scraper

by apidojo

Google Search Results Scraper

by apify

Instagram Profile Scraper

by apify

Tweet Scraper|$0.25/1K Tweets | Pay-Per Result | No Rate Limits

by kaitoeasyapi

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support