🩺 WebMD Doctor Scraper
by shahidirfan
Efficiently extract detailed doctor profiles, practice locations, and medical ratings from WebMD. This lightweight actor is optimized for speed and da...
Opens on Apify.com
About 🩺 WebMD Doctor Scraper
Efficiently extract detailed doctor profiles, practice locations, and medical ratings from WebMD. This lightweight actor is optimized for speed and data accuracy. To ensure smooth operation and prevent blocking, using residential proxies is highly recommended.
What does this actor do?
🩺 WebMD Doctor Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
WebMD Doctor Scraper Extract physician and healthcare provider information from WebMD. The actor uses Playwright (Firefox) to bypass anti-bot, then extracts listings from embedded window.__INITIAL_STATE__ JSON first (fast) with an HTML parsing fallback. ## Features - Multiple Data Extraction Methods: Prioritizes embedded JSON state, falls back to intelligent HTML parsing - JSON State First (Priority 1): Fast extraction via embedded window.__INITIAL_STATE__ JSON parse - Playwright Firefox: Solves JS/cookie challenges so API/HTML requests return real content - Comprehensive Data Collection: Extracts names, specialties, contact information, locations, websites, and biographies - Efficient Pagination: Automatically handles multi-page results with customizable limits - Optional Detail Extraction: Fetch full provider profiles for in-depth information - Structured Output: Consistent JSON schema for all extracted data - High Performance: Concurrent requests with optimized session pool management - Proxy Support: Full integration with Apify Proxy for reliable operation ## Output Schema Each doctor profile includes the following fields: | Field | Type | Description | |-------|------|-------------| | name | string | Full name of the physician | | specialty | string | Medical specialty/field | | phone | string | Contact phone number | | address | string | Full address (street, city, state, zip) | | website | string | Provider website or practice URL | | bio | string | HTML-formatted biography/credentials | | bio_text | string | Plain text version of biography | | url | string | Direct link to doctor's profile on WebMD | | source | string | Data source identifier | Example Output: json { "name": "Dr. John Smith, MD", "specialty": "Family Medicine", "phone": "(555) 123-4567", "address": "123 Medical Plaza Drive, Springfield, IL 62701", "website": "https://www.smithmedical.com", "bio": "<p>Dr. Smith is a board-certified family medicine physician with 15 years of experience...</p>", "bio_text": "Dr. Smith is a board-certified family medicine physician with 15 years of experience...", "url": "https://doctor.webmd.com/providers/[provider-id]", "source": "webmd.com" } ## Quick Start ### Basic Usage The simplest way to get started is to use default settings: json { "specialty": "family-medicine", "results_wanted": 50 } This will scrape the first 50 family medicine doctors from WebMD. ### Advanced Configuration For more control over scraping behavior: json { "specialty": "cardiology", "location": "New York", "results_wanted": 100, "max_pages": 10, "collectDetails": true, "useJsonApi": true, "proxyConfiguration": { "useApifyProxy": true } } ## Configuration Options ### Input Parameters specialty (string, optional) - Medical specialty to search for - Default: family-medicine - Examples: cardiology, dermatology, pediatrics, neurology - Use URL slug format (hyphens between words) location (string, optional) - Geographic location filter - Default: Empty (searches nationwide) - Examples: New York, California, Texas - Leave empty for all locations results_wanted (integer, optional) - Maximum number of doctor profiles to extract - Default: 50 - Minimum: 1 - Maximum recommended: 500 max_pages (integer, optional) - Safety limit on number of listing pages to process - Default: 5 - Minimum: 1 - Each page typically contains 10-20 profiles collectDetails (boolean, optional) - Whether to visit individual doctor profiles for complete information - Default: true - If false: Returns only profile URLs without detailed data The actor uses embedded window.__INITIAL_STATE__ JSON first and can run an HTML fallback internally (built-in defaults; not exposed as actor inputs). maxConcurrency (integer, optional) - Maximum number of parallel requests - Default: 10 debug (boolean, optional) - Saves small debug artifacts to Key-Value Store when blocked or when API JSON is invalid - Default: false startUrl / startUrls (string/array, optional) - Custom WebMD search URLs to start from - Overrides specialty and location parameters - Example: https://doctor.webmd.com/providers/specialty/family-medicine proxyConfiguration (object, optional) - Proxy settings for requests - Recommended: Use Apify Proxy (useApifyProxy: true) - Improves reliability and helps avoid rate limiting ## How It Works ### Scraping Process 1. Initialization: Loads your input configuration 2. Listing Extraction (Priority 1): Parses embedded window.__INITIAL_STATE__ JSON to extract providers 3. Detail Collection (optional): Fetches provider pages for JSON-LD/HTML enrichment 4. HTML Fallback (optional): Uses HTML parsing when JSON extraction yields no results 5. Storage: Saves all extracted data to the Apify Dataset ### Data Extraction Methods The actor employs a multi-tier approach for maximum data quality: 1. Embedded JSON State (Priority 1) - Parses window.__INITIAL_STATE__ for fast, stable listing extraction - Avoids brittle CSS selector scraping for search pages 2. JSON-LD Extraction (Priority 2) - Extracts schema.org Physician data from provider detail pages (when present) 3. HTML Parsing (Priority 3) - Intelligent fallback to CSS selectors - Searches multiple class names and attribute patterns - Handles variations in page markup ## Common Use Cases ### Case 1: Regional Doctor Search Search for all pediatricians in a specific state: json { "specialty": "pediatrics", "location": "California", "results_wanted": 100, "max_pages": 10 } ### Case 2: Quick Verification Get just the profile URLs without details for quick verification: json { "specialty": "dermatology", "results_wanted": 25, "collectDetails": false } ### Case 3: Comprehensive Research Extract detailed profiles for all specialists in a region: json { "specialty": "neurology", "location": "New York", "results_wanted": 200, "max_pages": 20, "collectDetails": true, "proxyConfiguration": { "useApifyProxy": true } } ## Best Practices ### Optimal Settings For Small Datasets (< 100 profiles) - Set max_pages: 3-5 - Use results_wanted: 50-100 - Enable proxy for reliability For Large Datasets (100-500+ profiles) - Set max_pages: 10-20 - Use proxy configuration (recommended) - Increase actor memory if needed For Production Use - Always use Apify Proxy (useApifyProxy: true) - Set reasonable results_wanted limits - Monitor actor logs for errors - Test with small batches first ### Performance Tips 1. Use Proxies: Apify Proxy prevents rate limiting and improves stability 2. Set Realistic Limits: Balance between data completeness and runtime 3. Enable Details Selectively: Detail scraping is thorough but slower 4. Monitor Resources: Watch memory usage during execution 5. Batch Large Requests: Split very large searches into multiple runs ## Troubleshooting ### Common Issues Issue: Limited results returned - Solution: Increase max_pages value - Check specialty name spelling - Verify location parameter format Issue: “Just a moment…” / blocked responses - Solution: Use Apify Proxy (Residential recommended) and reduce maxConcurrency - Enable debug: true to store small blocked-response snippets in Key-Value Store Issue: Slow performance - Solution: Reduce results_wanted or max_pages - Enable proxy for concurrent optimization - Increase actor memory allocation Issue: Missing data fields - Solution: Ensure collectDetails: true - Check if WebMD page structure changed - Verify proxy connectivity Issue: Actor timeout - Solution: Reduce max_pages or results_wanted - Increase requestTimeoutSecs in actor.json - Use proxy to improve response times ## Output Dataset All results are saved to an Apify Dataset with the following characteristics: - Format: JSON - Records: Individual doctor profiles - Sorting: By discovery order - Deduplication: Automatic (unique URLs) ### Accessing Results Results can be downloaded in multiple formats: - JSON (native format) - CSV (for spreadsheet analysis) - XML (for integration) - JSONL (for streaming) ## Data Quality & Compliance - Source Verification: All data extracted directly from public WebMD pages - Rate Limiting: Respects WebMD's terms of service with appropriate delays - Data Consistency: Validated against schema before storage - Error Handling: Robust error management with detailed logging - Privacy: No PII collection beyond publicly available information ## Compatibility - Target: WebMD Doctor Directory (doctor.webmd.com) - Browser: Required (Playwright Firefox) - JavaScript: Handles both static and dynamically-loaded content - Encoding: Full UTF-8 support ## Input Template Save this as INPUT.json for easy reuse: json { "specialty": "family-medicine", "location": "", "results_wanted": 50, "max_pages": 5, "collectDetails": true, } ## Rate Limits - Concurrent Requests: 10 simultaneous connections - Request Timeout: 60 seconds per request - Retry Attempts: Up to 3 retries on failure - Session Pool: Automatic session rotation for reliability ## Version History ### v2.0.0 (2025-12-13) - Complete conversion from jobs scraper to doctor scraper - WebMD-specific selectors and data extraction - Enhanced error handling and logging - Improved pagination logic - Added JSON-LD extraction support - Better performance and reliability ## Support For issues, feature requests, or questions, please refer to the Apify documentation or contact support. --- Last Updated: December 13, 2025 Scraper Version: 2.0.0 Target Website: WebMD Doctor Directory
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try 🩺 WebMD Doctor Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- shahidirfan
- Pricing
- Paid
- Total Runs
- 7
- Active Users
- 2
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support