Shopify Scraper (GraphQL)
by runexes
Scrape Shopify stores efficiently using their sitemap and official GraphQL API. Get clean product data fast with batching, incremental processing, and lower costs.
Opens on Apify.com
About Shopify Scraper (GraphQL)
If you've ever tried scraping a Shopify store, you know it can be a real headache. You either get blocked, miss half the data, or the script crawls at a snail's pace. That's why I built this scraper. It works the way Shopify expects: first, it intelligently crawls the store's `sitemap.xml` to find every product page. Then, instead of scraping the messy HTML, it makes clean, direct requests to Shopify's own Storefront GraphQL API to pull structured product data like titles, variants, prices, and images. This method is not only more reliable but also respects the store's infrastructure. The real magic is in the optimizations. I've added per-host batching to group requests, which drastically cuts down on the number of API calls and keeps costs low. It processes data incrementally, so if your run gets interrupted, you can pick up right where you left off. All the data is written to a buffered dataset, which means it's saved efficiently as it comes in, preventing memory issues on large jobs. I use this tool myself for competitive analysis, price monitoring, and building product catalogs. It's the fastest, most cost-effective way I've found to get clean, complete data from any Shopify store.
What does this actor do?
Shopify Scraper (GraphQL) is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Shopify Scraper (GraphQL)
An Apify actor that extracts product data from Shopify stores. It crawls a store's sitemap.xml to find product pages and then uses the Shopify Storefront GraphQL API to fetch detailed product information efficiently.
Overview
This actor is designed for speed and cost-effectiveness. It processes stores by batching GraphQL requests, supports incremental runs to avoid re-scraping, and can filter products based on their last modified date. Output is structured as one record per product, with all variants nested inside.
Key Features
- Sitemap-based Crawling: Discovers product URLs from a store's
sitemap.xml(specifically/products/<handle>paths). - Efficient GraphQL Queries: Batches multiple product requests into single GraphQL calls using aliases, reducing network overhead.
- Incremental Processing: Can skip products that have already been processed in previous runs.
- Date Filtering: Optionally ignores products not updated since a given date (
updatedSince). - Performance Tuning: Configurable batch size, concurrency, and buffered dataset writes.
- Extensible: Use
extendScraperFunctionfor custom logic during the scraping lifecycle andextendOutputFunctionto transform final records.
Input / Configuration
Core parameters required to run the actor:
startUrls: Array containing your targetsitemap.xmlURL(s).storefrontApiVersion: The Shopify Storefront API version to use (e.g.,2024-07).storefrontAccessToken: Your store's Storefront API access token.
Essential performance and utility settings:
maxRequestsPerCrawl,maxConcurrency,maxRequestRetries,proxyConfiguration: Standard Apify crawl controls.updatedSince: ISO date string; skips products with a<lastmod>older than this.batchSize: Number of product handles to query per GraphQL request (default:10).perHostConcurrency: Parallel GraphQL requests allowed per store host (default:2).bufferWrites&bufferSize: Controls buffering for dataset writes to improve performance.
How to Use
Running Locally
- Install dependencies:
bash npm install - Create an input file at
apify_storage/key_value_stores/default/INPUT.json:
json { "startUrls": [{ "url": "https://example.com/sitemap.xml" }], "storefrontApiVersion": "2024-07", "storefrontAccessToken": "<YOUR_TOKEN>", "maxRequestsPerCrawl": 50 } - Start the actor:
bash npm start
For development with auto-restart:
bash npm run dev
Docker Quick Start
Using the provided Makefile:
make init # Creates .env and INPUT.json from templates
make run # Builds and starts the Docker container
Output datasets will be in apify_storage/datasets/default.
Output
The actor saves one item per product to the dataset. The product's variants are available within the additional.variants property of each record. The structure is based on the response from the Shopify Storefront GraphQL API.
Extensibility
You can inject custom logic at specific points:
extendScraperFunction: Provides lifecycle hooks (SETUP,FILTER_SITEMAP_URL,PRENAVIGATION,POSTNAVIGATION,RUN,FINISHED) for custom actions.extendOutputFunction: Allows you to modify or filter the final product record before it is saved to the dataset.
Project Info
- License: Apache License 2.0 (see
LICENSEandNOTICEfiles). - CI/CD: GitHub Actions workflows (
ci.yml,codeql.yml) handle testing, linting, and security analysis on pushes and pull requests.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Shopify Scraper (GraphQL) now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- runexes
- Pricing
- Paid
- Total Runs
- 38
- Active Users
- 12
Related Actors
Google Maps Reviews Scraper
by compass
Facebook Ads Scraper
by apify
Google Ads Scraper
by silva95gustavo
Facebook marketplace scraper
by curious_coder
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support