Ferguson Reviews Spider
by getdataforme
Ferguson Reviews Spider scrapes customer reviews from Ferguson Home, extracting ratings, titles, review text, reviewer info, helpful counts, and brand...
Opens on Apify.com
About Ferguson Reviews Spider
Ferguson Reviews Spider scrapes customer reviews from Ferguson Home, extracting ratings, titles, review text, reviewer info, helpful counts, and brand responses, providing structured JSON output for easy analysis, sentiment tracking, and insights into product performance and customer feedback.
What does this actor do?
Ferguson Reviews Spider is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Apify Template for Scrapy Spiders This repository serves as a template for deploying Scrapy spiders to Apify. It is automatically updated by a GitHub Actions workflow in the central repository (getdataforme/central_repo) when changes are pushed to spider files in src/spiders/ or src/custom/. Below is an overview of the automated tasks performed to keep this repository in sync. ## Automated Tasks The following tasks are executed by the GitHub Actions workflow when a spider file (e.g., src/spiders/example/example_parser_spider.py) is modified in the central repository: 1. Repository Creation: - Creates a new Apify repository (e.g., example_apify) from this template (apify_template) using the GitHub API, if it doesn't already exist. - Grants push permissions to the scraping team in the getdataforme organization. 2. Spider File Sync: - Copies the modified spider file (e.g., example_parser_spider.py) from the central repository to src/spiders/ in this repository. - Copies the associated requirements.txt (if present) from the spider's directory (e.g., src/spiders/example/) to the root of this repository. 3. Input Schema Generation: - Runs generate_input_schema.py to create .actor/input_schema.json. - Parses the spider's __init__ method (e.g., def __init__(self, location:str, item_limit:int=100, county:str="Japan", *args, **kwargs)) to generate a JSON schema. - Supports types: string, integer, boolean, number (for Python str, int, bool, float). - Uses prefill for strings and default for non-strings, with appropriate editor values (textfield, number, checkbox). - Marks parameters without defaults (e.g., location) as required. 4. Main Script Update: - Runs update_main.py to update src/main.py. - Updates the actor_input section to fetch input values matching the spider's __init__ parameters (e.g., location, item_limit, county). - Updates the process.crawl call to pass these parameters to the spider (e.g., process.crawl(Spider, location=location, item_limit=item_limit, county=county)). - Preserves existing settings, comments, and proxy configurations. 5. Actor Configuration Update: - Updates .actor/actor.json to set the name field based on the repository name, removing the _apify suffix (e.g., example_apify → example). - Uses jq to modify the JSON file while preserving other fields (e.g., title, description, input). 6. Commit and Push: - Commits changes to src/spiders/$spider_file, requirements.txt, .actor/input_schema.json, src/main.py, and .actor/actor.json. - Pushes the changes to the main branch of this repository. ## Repository Structure - src/spiders/: Contains the Scrapy spider file (e.g., example_parser_spider.py). - src/main.py: Main script to run the spider with Apify Actor integration. - .actor/input_schema.json: JSON schema defining the spider's input parameters. - .actor/actor.json: Actor configuration with the repository name and metadata. - requirements.txt: Python dependencies for the spider. - Dockerfile: Docker configuration for running the Apify Actor. ## Prerequisites - The central repository (getdataforme/central_repo) must contain: - generate_input_schema.py and update_main.py in the root. - Spider files in src/spiders/ or src/custom/ with a valid __init__ method. - The GitHub Actions workflow requires a GITHUB_TOKEN with repository creation and write permissions. - jq and python3 are installed in the workflow environment. ## Testing To verify the automation: 1. Push a change to a spider file in src/spiders/ or src/custom/ in the central repository. 2. Check the generated Apify repository (e.g., getdataforme/example_apify) for: - Updated src/spiders/$spider_file. - Correct input_schema.json with parameters matching the spider's __init__. - Updated src/main.py with correct actor_input and process.crawl lines. - Updated .actor/actor.json with the correct name field. ## Notes > Warning: This Apify actor repository is automatically generated and updated by the GitHub Actions workflow in getdataforme/central_repo. Do not edit this repository directly. To modify the spider, update the corresponding file in src/spiders/ or src/custom/ in the central repository, and the workflow will sync changes to this repository, including: > - Copying the spider file to src/spiders/. > - Generating .actor/input_schema.json based on the spider’s __init__ parameters. > - Updating src/main.py with correct input handling and spider execution. > - Setting the name field in .actor/actor.json (e.g., example for example_apify). > > Verification: After the workflow completes, verify the actor by checking: > - src/spiders/$spider_file matches the central repository. > - .actor/input_schema.json includes all __init__ parameters with correct types and defaults. > - src/main.py has updated actor_input and process.crawl lines. > - .actor/actor.json has the correct name. > - Optionally, deploy the actor to Apify and test with sample inputs to ensure functionality. - The workflow supports multiple spider types (scrapy, hrequest, playwright) based on the file path (src/spiders/, src/custom/*/hrequest/, src/custom/*/playwright/). - Commits with [apify] in the message update only Apify repositories; [internal] updates only internal repositories; otherwise, both are updated. - Ensure the spider's __init__ uses supported types (str, int, bool, float) to avoid schema generation errors. For issues, check the GitHub Actions logs in the central repository or contact the scraping team.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Ferguson Reviews Spider now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- getdataforme
- Pricing
- Paid
- Total Runs
- 61
- Active Users
- 3
Related Actors
Google Maps Reviews Scraper
by compass
Facebook Ads Scraper
by apify
Google Ads Scraper
by silva95gustavo
Facebook marketplace scraper
by curious_coder
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support