Google Review Scraper

Google Review Scraper

by shahzeb-king

High-performance Google Reviews scraper built with Playwright & Node.js. Extract reviews, ratings, authors & dates from any Google business using plac...

164 runs
9 users
Try This Actor

Opens on Apify.com

About Google Review Scraper

High-performance Google Reviews scraper built with Playwright & Node.js. Extract reviews, ratings, authors & dates from any Google business using place IDs. Optimized for speed with stealth mode, Docker-ready, Apify-compatible. Batch processing, JSON output, minimal delays.

What does this actor do?

Google Review Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Google Review Scraper A highly optimized Node.js application designed to scrape Google reviews with Playwright, built for compatibility with the Apify platform and Docker deployment. Input-driven with no database dependencies. ## Features - πŸš€ High Performance: Batch processing with single browser instance - ⚑ Speed Optimized: Minimal delays (1-2s navigation, 0.1-0.8s scrolling) - πŸ•ΈοΈ Stealth Mode: Advanced anti-detection techniques - 🐳 Docker Ready: Full containerization support - πŸ”„ Apify Compatible: Ready for Apify platform deployment - πŸ“₯ Input Driven: Simply provide place IDs as input - πŸ“„ JSON Output: Results saved as structured JSON files - 🎯 Precise Parsing: Extracts author, rating, date, and review text - πŸ”€ Human-like Behavior: Randomized actions with minimal delays ## Performance Optimizations This scraper is highly optimized for speed while maintaining reliability: - Reduced Delays: Navigation delays reduced to 1-2 seconds, scroll delays to 0.3-0.8 seconds - Configurable Scrolling: Default 5 scrolls (vs 8-15), customizable via maxScrolls input - Single Browser Instance: Reuses browser context across multiple place IDs - Minimal Anti-Detection: Just enough randomization to avoid detection without sacrificing speed - Fast Parsing: Uses Cheerio for rapid HTML parsing - Batch Processing: Processes multiple place IDs in a single session ## Prerequisites - Node.js (version 18.0.0 or higher) - Chrome browser (for local development) - Docker & Docker Compose (for containerized deployment) ## Quick Start ### Local Development 1. Clone and setup: bash git clone https://github.com/MuhammadShahzeb123/google-review-scraper-2.git cd google-review-scraper npm install 2. Install Playwright browsers: bash npx playwright install chromium 3. Run with place IDs: bash # Method 1: Command line arguments node src/main.js ChIJN1t_tDeuEmsRUsoyG83frY4 ChIJGVtI4by3t4kRr51d_Qm_x58 # Method 2: Environment variable PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58" npm start # Method 3: Run example node example.js ### Docker Deployment 1. Using Docker Compose: bash PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58" docker-compose up 2. Using Docker only: bash docker build -t google-review-scraper . docker run --rm \ -e PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58" \ -v $(pwd)/output:/app/output \ google-review-scraper ### Apify Platform 1. Configure via Apify Input UI: All settings are now configurable through the Apify input interface: - Place IDs: List of Google Place IDs to scrape - Max Scrolls: Number of scroll actions (1-20, default: 5) - Headless Mode: Run browser in background (default: true) - Browser Timeout: Page load timeout in ms (default: 60000) - Cleanup HTML: Delete temp files after processing (default: true) - Log Level: Logging verbosity (error/warn/info/debug) 2. Push to Apify: bash apify login apify push 3. Run on Apify: bash apify run ## Configuration ### Apify Platform (Recommended) All configuration is done through the Apify Input UI - no environment variables needed! ### Local/Standalone Environment Variables (Legacy) | Variable | Description | Default | |----------|-------------|---------| | PLACE_IDS | Comma-separated place IDs | Example IDs | | HEADLESS_MODE | Browser headless mode | false | | BROWSER_TIMEOUT | Page timeout (ms) | 60000 | | SCROLL_COUNT_MIN | Min scroll actions | 8 | | SCROLL_COUNT_MAX | Max scroll actions | 15 | ### Input Methods 1. Command Line Arguments: bash node src/main.js PLACE_ID_1 PLACE_ID_2 PLACE_ID_3 2. Environment Variable: bash export PLACE_IDS="ChIJN1t_tDeuEmsRUsoyG83frY4,ChIJGVtI4by3t4kRr51d_Qm_x58" npm start 3. Programmatic Usage: javascript const { scrapeReviews } = require('./src/main'); const placeIds = ['ChIJN1t_tDeuEmsRUsoyG83frY4']; const reviews = await scrapeReviews(placeIds); 4. Apify Input: json { "placeIds": [ "ChIJN1t_tDeuEmsRUsoyG83frY4", "ChIJGVtI4by3t4kRr51d_Qm_x58" ], "headlessMode": true, "scrollCount": 10 } ## Output Format ### For Apify Platform: Each review is stored as a separate dataset item for optimal Apify integration: json { "reviewerName": "John Doe", "rating": 5, "reviewText": "Great place, highly recommended!", "reviewDate": "2 months ago", "placeId": "ChIJN1t_tDeuEmsRUsoyG83frY4", "scrapedAt": "2025-06-16T12:00:00.000Z", "success": true } Summary statistics are saved in the OUTPUT key-value store: json { "totalPlaceIds": 2, "totalReviews": 45, "successfulScrapes": 2, "failedScrapes": 0, "maxScrollsUsed": 5, "processedAt": "2025-06-16T12:00:00.000Z", "placeIds": ["ChIJN1t_tDeuEmsRUsoyG83frY4"], "scrapingResults": {"ChIJN1t_tDeuEmsRUsoyG83frY4": true} } ### For Local/Standalone Usage: Results are saved as JSON files with nested structure: json { "summary": { "totalPlaceIds": 2, "totalReviews": 45, "successfulScrapes": 2, "failedScrapes": 0, "processedAt": "2025-06-16T12:00:00.000Z", "placeIds": ["ChIJN1t_tDeuEmsRUsoyG83frY4"] }, "reviews": [ { "author": "John Doe", "stars": 5, "date": "2 months ago", "text": "Great place, highly recommended!", "placeId": "ChIJN1t_tDeuEmsRUsoyG83frY4", "scrapedAt": "2025-06-16T12:00:00.000Z", "success": true } ] } ## Project Structure β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ main.js # Main application logic β”‚ β”œβ”€β”€ scraper.js # Playwright browser automation β”‚ β”œβ”€β”€ google_reviews_parser.js # HTML parsing with Cheerio β”‚ β”œβ”€β”€ config.js # Configuration management β”‚ └── apify-main.js # Apify-compatible version β”œβ”€β”€ example.js # Usage example β”œβ”€β”€ Dockerfile # Docker container configuration β”œβ”€β”€ docker-compose.yml # Container orchestration β”œβ”€β”€ package.json # Node.js dependencies β”œβ”€β”€ INPUT_SCHEMA.json # Apify input schema └── .env.example # Environment template ## How It Works 1. Input Processing: Accepts place IDs from various sources 2. Browser Automation: Launches Chrome with stealth settings 3. Navigation: Visits Google Maps place pages for each place ID 4. Review Loading: Clicks Reviews tab, sorts by newest, scrolls to load more 5. HTML Extraction: Saves review container HTML to temporary files 6. Parsing: Uses Cheerio to extract structured review data 7. Output: Saves results as JSON and cleans up temporary files ## Advanced Features ### Stealth Mode - Custom user agent and viewport - Disabled automation flags - Random delays between actions - Human-like scrolling patterns ### Error Handling - Graceful failure handling per place ID - Screenshot capture on errors - Comprehensive logging - Automatic retry mechanisms ### Performance Optimization - Single browser instance for all place IDs - Parallel file processing - Efficient memory management - Automatic cleanup of temporary files ## API Reference ### scrapeReviews(placeIds) Programmatically scrape reviews for given place IDs. Parameters: - placeIds (Array): Array of Google Place IDs Returns: - Promise<Array>: Array of review objects with metadata Example: javascript const { scrapeReviews } = require('./src/main'); const reviews = await scrapeReviews([ 'ChIJN1t_tDeuEmsRUsoyG83frY4' ]); console.log(`Found ${reviews.length} reviews`); ## Troubleshooting ### Common Issues 1. Browser not found: bash npx playwright install chromium 2. No place IDs provided: - Ensure place IDs are valid Google Place IDs - Check input format (comma-separated for env vars) 3. Permission errors in Docker: - Application runs as non-root user for security - Check file permissions when mounting volumes ### Finding Google Place IDs 1. Search for a business on Google Maps 2. Look at the URL: ...place/.../@... or use the share link 3. Extract the place ID from the URL or use Google Places API ## Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Test thoroughly 5. Submit a pull request ## License ISC License - see LICENSE file for details.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Google Review Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
shahzeb-king
Pricing
Paid
Total Runs
164
Active Users
9
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support