Contact Details Scraper
by practicaltools
An Apify actor that crawls any website to extract contact details and social media profiles. This tool can extract emails, phone numbers, and profiles...
Opens on Apify.com
About Contact Details Scraper
An Apify actor that crawls any website to extract contact details and social media profiles. This tool can extract emails, phone numbers, and profiles from LinkedIn, Twitter, Instagram, Facebook, YouTube, TikTok, Pinterest, Discord, Snapchat, Threads, and Telegram.
What does this actor do?
Contact Details Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Contact Details Scraper An Apify actor that crawls any website to extract contact details and social media profiles. This tool can extract emails, phone numbers, and profiles from LinkedIn, Twitter, Instagram, Facebook, YouTube, TikTok, Pinterest, Discord, Snapchat, Threads, and Telegram. ## Fair and affordable, pay only for found results ### Pay-Per-Result Model - $0.0045 per successful result (only charge when contact details are found) - $4.50 per 1,000 successful results ## Features - Comprehensive Contact Extraction: Extracts emails and phone numbers from both HTML attributes and page text - Social Media Profile Detection: Finds profiles across 11+ social media platforms - Smart Crawling: Respects domain boundaries and crawl depth limits - Deduplication: Automatically removes duplicate contacts across all crawled pages - Aggregated Results: Provides both individual page results and a consolidated summary - Configurable: Flexible input options for different crawling scenarios ## Extracted Data Types ### Contact Information - Email addresses: From mailto links and text content - Phone numbers: From tel: links (reliable) and text extraction (may include false positives) ### Social Media Profiles - LinkedIn profiles - Twitter/X handles - Instagram profiles - Facebook profiles and pages - YouTube channels - TikTok profiles - Pinterest profiles - Discord servers/invites - Snapchat profiles - Threads profiles - Telegram channels/groups ## Input Configuration ### Required Parameters - startUrls (array): List of URLs to start crawling from ### Optional Parameters - maxCrawlDepth (integer, default: 1): How many levels deep to crawl links (conservative) - stayWithinDomain (boolean, default: true): Only follow links within the same domain - maxCrawlPages (integer, default: 40): Maximum number of pages to crawl (conservative limit) - extractFromText (boolean, default: true): Extract phone numbers from text (may have false positives) - headless (boolean, default: true): Run browser in headless mode - waitForSelector (string): CSS selector to wait for before extracting - waitForLoadState (string, default: 'domcontentloaded'): Load state to wait for - proxyConfiguration (object): Proxy settings ### Example Input json { "startUrls": [ { "url": "https://example.com" }, { "url": "https://company.com/about" } ], "maxCrawlDepth": 1, "stayWithinDomain": true, "maxCrawlPages": 40, "extractFromText": true, "proxyConfiguration": { "useApifyProxy": true } } ## Output Format The actor provides two types of output: ### 1. Individual Page Results Each crawled page generates a result with: json { "url": "https://example.com/contact", "domain": "example.com", "depth": 1, "originalStartUrl": "https://example.com", "referrerUrl": "https://example.com", "emails": ["contact@example.com"], "phones": ["+1234567890"], "phonesUncertain": ["123.456.7890"], "linkedIns": ["https://linkedin.com/company/example"], "twitters": ["https://twitter.com/example"], "instagrams": ["https://instagram.com/example"], "facebooks": ["https://facebook.com/example"], "youtubes": [], "tiktoks": [], "pinterests": [], "discords": [], "snapchats": [], "threads": [], "telegrams": [] } ### 2. Aggregated Summary A deduplicated summary of all found contacts stored in AGGREGATED_RESULTS: json { "crawlSummary": { "totalPages": 25, "totalResults": 25, "maxDepth": 2, "stayWithinDomain": true, "startUrls": ["https://example.com"] }, "aggregatedResults": { "emails": ["contact@example.com", "info@example.com"], "phones": ["+1234567890"], "phonesUncertain": ["123.456.7890"], "linkedIns": ["https://linkedin.com/company/example"], // ... other social profiles }, "totalContacts": { "emails": 2, "phones": 1, "phonesUncertain": 1, "linkedIns": 1, // ... counts for each type } } ## Usage Tips 1. Start with key pages: Include contact pages, about pages, and team pages in your start URLs for best results 2. Adjust crawl depth: Use depth 0 for specific pages only, or 2-3 for broader discovery 3. Domain restrictions: Keep stayWithinDomain true to avoid crawling external sites 4. Phone extraction: Set extractFromText to false if you're getting too many false positive phone numbers 5. Performance: Use appropriate maxCrawlPages limits to control execution time and costs ## Technical Notes - Uses Playwright for robust page rendering and JavaScript execution - Employs Cheerio for enhanced HTML parsing and extraction - Implements intelligent URL normalization and deduplication - Handles failed requests gracefully with error reporting - Supports proxy rotation for large-scale crawling ## Limitations - Phone number extraction from text may include false positives - Some social media profiles may require specific URL patterns to be detected - JavaScript-heavy sites may need additional wait conditions - Rate limiting may apply for large crawls ## Legal Considerations Be aware that extracting contact information may involve personal data protected by GDPR and other privacy regulations. Ensure you have legitimate reasons for collecting this data and comply with applicable laws. ## Pricing This actor uses a Pay-per-Result pricing model - you only pay for pages that successfully extract contact information! - $4.50 per 1,000 successful results - Only pages with contact details count - no payment for empty pages - 20% cheaper than competitors who charge per page regardless of results - Better value - you pay for actual data, not failed attempts ### Cost Examples: - 40 pages with 25 successful extractions: $0.11 (25 × $0.0045) - 100 pages with 60 successful extractions: $0.27 (60 × $0.0045) - 1,000 pages with 400 successful extractions: $1.80 (400 × $0.0045) The free tier includes $5 in credits, allowing you to extract contact details from approximately 1,111 successful pages. ## Support For issues, feature requests, or questions about this actor, please refer to the Apify documentation or community forums.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Contact Details Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- practicaltools
- Pricing
- Paid
- Total Runs
- 1,948
- Active Users
- 124
Related Actors
🏯 Tweet Scraper V2 - X / Twitter Scraper
by apidojo
Google Search Results Scraper
by apify
Instagram Profile Scraper
by apify
Tweet Scraper|$0.25/1K Tweets | Pay-Per Result | No Rate Limits
by kaitoeasyapi
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support