Ferguson Reviews Spider

Name: Ferguson Reviews Spider
Author: getdataforme

by getdataforme

Ferguson Reviews Spider scrapes customer reviews from Ferguson Home, extracting ratings, titles, review text, reviewer info, helpful counts, and brand...

61 runs

3 users

Try This Actor

Opens on Apify.com

About Ferguson Reviews Spider

Ferguson Reviews Spider scrapes customer reviews from Ferguson Home, extracting ratings, titles, review text, reviewer info, helpful counts, and brand responses, providing structured JSON output for easy analysis, sentiment tracking, and insights into product performance and customer feedback.

What does this actor do?

Ferguson Reviews Spider is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Apify Template for Scrapy Spiders This repository serves as a template for deploying Scrapy spiders to Apify. It is automatically updated by a GitHub Actions workflow in the central repository (`getdataforme/central_repo`) when changes are pushed to spider files in `src/spiders/` or `src/custom/`. Below is an overview of the automated tasks performed to keep this repository in sync. ## Automated Tasks The following tasks are executed by the GitHub Actions workflow when a spider file (e.g., `src/spiders/example/example_parser_spider.py`) is modified in the central repository: 1. Repository Creation: - Creates a new Apify repository (e.g., `example_apify`) from this template (`apify_template`) using the GitHub API, if it doesn't already exist. - Grants push permissions to the `scraping` team in the `getdataforme` organization. 2. Spider File Sync: - Copies the modified spider file (e.g., `example_parser_spider.py`) from the central repository to `src/spiders/` in this repository. - Copies the associated `requirements.txt` (if present) from the spider's directory (e.g., `src/spiders/example/`) to the root of this repository. 3. Input Schema Generation: - Runs `generate_input_schema.py` to create `.actor/input_schema.json`. - Parses the spider's `init` method (e.g., `def init(self, location:str, item_limit:int=100, county:str="Japan", *args, kwargs)`) to generate a JSON schema. - Supports types: `string`, `integer`, `boolean`, `number` (for Python `str`, `int`, `bool`, `float`). - Uses `prefill` for strings and `default` for non-strings, with appropriate `editor` values (`textfield`, `number`, `checkbox`). - Marks parameters without defaults (e.g., `location`) as `required`. 4. Main Script Update: - Runs `update_main.py` to update `src/main.py`. - Updates the `actor_input` section to fetch input values matching the spider's `init` parameters (e.g., `location`, `item_limit`, `county`). - Updates the `process.crawl` call to pass these parameters to the spider (e.g., `process.crawl(Spider, location=location, item_limit=item_limit, county=county)`). - Preserves existing settings, comments, and proxy configurations. 5. Actor Configuration Update: - Updates `.actor/actor.json` to set the `name` field based on the repository name, removing the `_apify` suffix (e.g., `example_apify` → `example`). - Uses `jq` to modify the JSON file while preserving other fields (e.g., `title`, `description`, `input`). 6. Commit and Push: - Commits changes to `src/spiders/$spider_file`, `requirements.txt`, `.actor/input_schema.json`, `src/main.py`, and `.actor/actor.json`. - Pushes the changes to the `main` branch of this repository. ## Repository Structure - `src/spiders/`: Contains the Scrapy spider file (e.g., `example_parser_spider.py`). - `src/main.py`: Main script to run the spider with Apify Actor integration. - `.actor/input_schema.json`: JSON schema defining the spider's input parameters. - `.actor/actor.json`: Actor configuration with the repository name and metadata. - `requirements.txt`: Python dependencies for the spider. - `Dockerfile`: Docker configuration for running the Apify Actor. ## Prerequisites - The central repository (`getdataforme/central_repo`) must contain: - `generate_input_schema.py` and `update_main.py` in the root. - Spider files in `src/spiders/` or `src/custom/` with a valid `init` method. - The GitHub Actions workflow requires a `GITHUB_TOKEN` with repository creation and write permissions. - `jq` and `python3` are installed in the workflow environment. ## Testing To verify the automation: 1. Push a change to a spider file in `src/spiders/` or `src/custom/` in the central repository. 2. Check the generated Apify repository (e.g., `getdataforme/example_apify`) for: - Updated `src/spiders/$spider_file`. - Correct `input_schema.json` with parameters matching the spider's `init`. - Updated `src/main.py` with correct `actor_input` and `process.crawl` lines. - Updated `.actor/actor.json` with the correct `name` field. ## Notes > Warning: This Apify actor repository is automatically generated and updated by the GitHub Actions workflow in `getdataforme/central_repo`. Do not edit this repository directly. To modify the spider, update the corresponding file in `src/spiders/` or `src/custom/` in the central repository, and the workflow will sync changes to this repository, including: > - Copying the spider file to `src/spiders/`. > - Generating `.actor/input_schema.json` based on the spider’s `init` parameters. > - Updating `src/main.py` with correct input handling and spider execution. > - Setting the `name` field in `.actor/actor.json` (e.g., `example` for `example_apify`). > > Verification**: After the workflow completes, verify the actor by checking: > - `src/spiders/$spider_file` matches the central repository. > - `.actor/input_schema.json` includes all `init` parameters with correct types and defaults. > - `src/main.py` has updated `actor_input` and `process.crawl` lines. > - `.actor/actor.json` has the correct `name`. > - Optionally, deploy the actor to Apify and test with sample inputs to ensure functionality. - The workflow supports multiple spider types (`scrapy`, `hrequest`, `playwright`) based on the file path (`src/spiders/`, `src/custom//hrequest/`, `src/custom//playwright/`). - Commits with `[apify]` in the message update only Apify repositories; `[internal]` updates only internal repositories; otherwise, both are updated. - Ensure the spider's `init` uses supported types (`str`, `int`, `bool`, `float`) to avoid schema generation errors. For issues, check the GitHub Actions logs in the central repository or contact the `scraping` team.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Ferguson Reviews Spider now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: getdataforme
Pricing: Paid
Total Runs: 61
Active Users: 3

Related Actors

Google Maps Reviews Scraper

by compass

Facebook Ads Scraper

by apify

Google Ads Scraper

by silva95gustavo

Facebook marketplace scraper

by curious_coder

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support