🩺 WebMD Doctor Scraper

Name: 🩺 WebMD Doctor Scraper
Author: shahidirfan

by shahidirfan

Efficiently extract detailed doctor profiles, practice locations, and medical ratings from WebMD. This lightweight actor is optimized for speed and da...

7 runs

2 users

Try This Actor

Opens on Apify.com

About 🩺 WebMD Doctor Scraper

Efficiently extract detailed doctor profiles, practice locations, and medical ratings from WebMD. This lightweight actor is optimized for speed and data accuracy. To ensure smooth operation and prevent blocking, using residential proxies is highly recommended.

What does this actor do?

🩺 WebMD Doctor Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

WebMD Doctor Scraper Extract physician and healthcare provider information from WebMD. The actor uses Playwright (Firefox) to bypass anti-bot, then extracts listings from embedded `window.__INITIAL_STATE` JSON first (fast) with an HTML parsing fallback. ## Features - Multiple Data Extraction Methods: Prioritizes embedded JSON state, falls back to intelligent HTML parsing - JSON State First (Priority 1): Fast extraction via embedded `window.INITIAL_STATE__` JSON parse - Playwright Firefox: Solves JS/cookie challenges so API/HTML requests return real content - Comprehensive Data Collection: Extracts names, specialties, contact information, locations, websites, and biographies - Efficient Pagination: Automatically handles multi-page results with customizable limits - Optional Detail Extraction: Fetch full provider profiles for in-depth information - Structured Output: Consistent JSON schema for all extracted data - High Performance: Concurrent requests with optimized session pool management - Proxy Support: Full integration with Apify Proxy for reliable operation ## Output Schema Each doctor profile includes the following fields: | Field | Type | Description | |-------|------|-------------| | `name` | string | Full name of the physician | | `specialty` | string | Medical specialty/field | | `phone` | string | Contact phone number | | `address` | string | Full address (street, city, state, zip) | | `website` | string | Provider website or practice URL | | `bio` | string | HTML-formatted biography/credentials | | `bio_text` | string | Plain text version of biography | | `url` | string | Direct link to doctor's profile on WebMD | | `source` | string | Data source identifier | Example Output: `json { "name": "Dr. John Smith, MD", "specialty": "Family Medicine", "phone": "(555) 123-4567", "address": "123 Medical Plaza Drive, Springfield, IL 62701", "website": "https://www.smithmedical.com", "bio": "<p>Dr. Smith is a board-certified family medicine physician with 15 years of experience...</p>", "bio_text": "Dr. Smith is a board-certified family medicine physician with 15 years of experience...", "url": "https://doctor.webmd.com/providers/[provider-id]", "source": "webmd.com" }` ## Quick Start ### Basic Usage The simplest way to get started is to use default settings: `json { "specialty": "family-medicine", "results_wanted": 50 }` This will scrape the first 50 family medicine doctors from WebMD. ### Advanced Configuration For more control over scraping behavior: `json { "specialty": "cardiology", "location": "New York", "results_wanted": 100, "max_pages": 10, "collectDetails": true, "useJsonApi": true, "proxyConfiguration": { "useApifyProxy": true } }` ## Configuration Options ### Input Parameters `specialty` (string, optional) - Medical specialty to search for - Default: `family-medicine` - Examples: `cardiology`, `dermatology`, `pediatrics`, `neurology` - Use URL slug format (hyphens between words) `location` (string, optional) - Geographic location filter - Default: Empty (searches nationwide) - Examples: `New York`, `California`, `Texas` - Leave empty for all locations `results_wanted` (integer, optional) - Maximum number of doctor profiles to extract - Default: `50` - Minimum: `1` - Maximum recommended: `500` `max_pages` (integer, optional) - Safety limit on number of listing pages to process - Default: `5` - Minimum: `1` - Each page typically contains 10-20 profiles `collectDetails` (boolean, optional) - Whether to visit individual doctor profiles for complete information - Default: `true` - If `false`: Returns only profile URLs without detailed data The actor uses embedded `window.__INITIAL_STATE` JSON first and can run an HTML fallback internally (built-in defaults; not exposed as actor inputs). `maxConcurrency` (integer, optional) - Maximum number of parallel requests - Default: `10` `debug` (boolean, optional) - Saves small debug artifacts to Key-Value Store when blocked or when API JSON is invalid - Default: `false` `startUrl` / `startUrls` (string/array, optional) - Custom WebMD search URLs to start from - Overrides `specialty` and `location` parameters - Example: `https://doctor.webmd.com/providers/specialty/family-medicine` `proxyConfiguration` (object, optional) - Proxy settings for requests - Recommended: Use Apify Proxy (`useApifyProxy: true`) - Improves reliability and helps avoid rate limiting ## How It Works ### Scraping Process 1. Initialization: Loads your input configuration 2. Listing Extraction (Priority 1): Parses embedded `window.INITIAL_STATE` JSON to extract providers 3. Detail Collection (optional): Fetches provider pages for JSON-LD/HTML enrichment 4. HTML Fallback (optional): Uses HTML parsing when JSON extraction yields no results 5. Storage: Saves all extracted data to the Apify Dataset ### Data Extraction Methods The actor employs a multi-tier approach for maximum data quality: 1. Embedded JSON State (Priority 1) - Parses `window.INITIAL_STATE__` for fast, stable listing extraction - Avoids brittle CSS selector scraping for search pages 2. JSON-LD Extraction (Priority 2) - Extracts schema.org `Physician` data from provider detail pages (when present) 3. HTML Parsing (Priority 3) - Intelligent fallback to CSS selectors - Searches multiple class names and attribute patterns - Handles variations in page markup ## Common Use Cases ### Case 1: Regional Doctor Search Search for all pediatricians in a specific state: `json { "specialty": "pediatrics", "location": "California", "results_wanted": 100, "max_pages": 10 }` ### Case 2: Quick Verification Get just the profile URLs without details for quick verification: `json { "specialty": "dermatology", "results_wanted": 25, "collectDetails": false }` ### Case 3: Comprehensive Research Extract detailed profiles for all specialists in a region: `json { "specialty": "neurology", "location": "New York", "results_wanted": 200, "max_pages": 20, "collectDetails": true, "proxyConfiguration": { "useApifyProxy": true } }` ## Best Practices ### Optimal Settings For Small Datasets (< 100 profiles) - Set `max_pages: 3-5` - Use `results_wanted: 50-100` - Enable proxy for reliability For Large Datasets (100-500+ profiles) - Set `max_pages: 10-20` - Use proxy configuration (recommended) - Increase actor memory if needed For Production Use - Always use Apify Proxy (`useApifyProxy: true`) - Set reasonable `results_wanted` limits - Monitor actor logs for errors - Test with small batches first ### Performance Tips 1. Use Proxies: Apify Proxy prevents rate limiting and improves stability 2. Set Realistic Limits: Balance between data completeness and runtime 3. Enable Details Selectively: Detail scraping is thorough but slower 4. Monitor Resources: Watch memory usage during execution 5. Batch Large Requests: Split very large searches into multiple runs ## Troubleshooting ### Common Issues Issue: Limited results returned - Solution: Increase `max_pages` value - Check specialty name spelling - Verify location parameter format Issue: “Just a moment…” / blocked responses - Solution: Use Apify Proxy (Residential recommended) and reduce `maxConcurrency` - Enable `debug: true` to store small blocked-response snippets in Key-Value Store Issue: Slow performance - Solution: Reduce `results_wanted` or `max_pages` - Enable proxy for concurrent optimization - Increase actor memory allocation Issue: Missing data fields - Solution: Ensure `collectDetails: true` - Check if WebMD page structure changed - Verify proxy connectivity Issue: Actor timeout - Solution: Reduce `max_pages` or `results_wanted` - Increase `requestTimeoutSecs` in actor.json - Use proxy to improve response times ## Output Dataset All results are saved to an Apify Dataset with the following characteristics: - Format: JSON - Records: Individual doctor profiles - Sorting: By discovery order - Deduplication: Automatic (unique URLs) ### Accessing Results Results can be downloaded in multiple formats: - JSON (native format) - CSV (for spreadsheet analysis) - XML (for integration) - JSONL (for streaming) ## Data Quality & Compliance - Source Verification: All data extracted directly from public WebMD pages - Rate Limiting: Respects WebMD's terms of service with appropriate delays - Data Consistency: Validated against schema before storage - Error Handling: Robust error management with detailed logging - Privacy: No PII collection beyond publicly available information ## Compatibility - Target: WebMD Doctor Directory (doctor.webmd.com) - Browser: Required (Playwright Firefox) - JavaScript: Handles both static and dynamically-loaded content - Encoding: Full UTF-8 support ## Input Template Save this as `INPUT.json` for easy reuse: `json { "specialty": "family-medicine", "location": "", "results_wanted": 50, "max_pages": 5, "collectDetails": true, }` ## Rate Limits - Concurrent Requests: 10 simultaneous connections - Request Timeout: 60 seconds per request - Retry Attempts: Up to 3 retries on failure - Session Pool: Automatic session rotation for reliability ## Version History ### v2.0.0 (2025-12-13) - Complete conversion from jobs scraper to doctor scraper - WebMD-specific selectors and data extraction - Enhanced error handling and logging - Improved pagination logic - Added JSON-LD extraction support - Better performance and reliability ## Support For issues, feature requests, or questions, please refer to the Apify documentation or contact support. --- Last Updated: December 13, 2025 Scraper Version: 2.0.0 Target Website: WebMD Doctor Directory

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try 🩺 WebMD Doctor Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: shahidirfan
Pricing: Paid
Total Runs: 7
Active Users: 2

Related Actors

Web Scraper

by apify

Cheerio Scraper

by apify

Website Content Crawler

by apify

Legacy PhantomJS Crawler

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

🩺 WebMD Doctor Scraper

About 🩺 WebMD Doctor Scraper

What does this actor do?

Key Features

How to Use

Documentation

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?