Shopify Products Scraper Pro

Name: Shopify Products Scraper Pro
Author: n0rmaliz3

by n0rmaliz3

Extract product data from any Shopify store using official JSON API. Get products, variants, prices, inventory, images, and metadata. No authenticatio...

29 runs

3 users

Try This Actor

Opens on Apify.com

About Shopify Products Scraper Pro

Extract product data from any Shopify store using official JSON API. Get products, variants, prices, inventory, images, and metadata. No authentication required. Fast, accurate, and cost-effective solution for e-commerce intelligence and competitor analysis.

What does this actor do?

Shopify Products Scraper Pro is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Shopify Products Scraper Pro Extract comprehensive product data from any Shopify store using the official Shopify JSON API. Fast, reliable, and cost-effective solution for e-commerce data extraction, competitor analysis, and market research. ## What Does This Actor Do? Shopify Products Scraper Pro extracts product information from Shopify stores without requiring authentication or API keys. It collects structured data including product details, variants, prices, inventory levels, images, and metadata - perfect for e-commerce intelligence, dropshipping, price monitoring, and market analysis. ## Why Choose This Scraper? This actor scrapes product information from Shopify stores without requiring authentication or API keys. It leverages Shopify's public JSON endpoints to extract structured data including products, variants, prices, inventory levels, images, and metadata. Key Features: - Uses official Shopify JSON API (not HTML scraping) - Works on any public Shopify store - No authentication required - High accuracy and reliability - Automatic pagination and retry logic - Respectful rate limiting ## Use Cases E-commerce Intelligence: - Competitor product analysis and pricing research - Market trend identification and category analysis - Product catalog monitoring and updates Business Operations: - Dropshipping supplier inventory tracking - Price comparison platform data collection - Product database enrichment and synchronization Market Research: - Industry product trends analysis - Vendor and brand comparison - Seasonal catalog changes tracking ## Input Configuration ### Required Parameters storeDomain (String) - The Shopify store domain to scrape - Example: `gymshark.com` or `store.myshopify.com` - Do not include `https://` or paths ### Optional Parameters mode (String) - Scraping mode: `all`, `collection`, or `handles` - Default: `all` - `all`: Scrape all products from the store - `collection`: Scrape products from a specific collection - `handles`: Scrape specific products by handle collectionHandle (String) - Collection handle to scrape (required when mode is `collection`) - Example: `mens`, `sale`, `new-arrivals` - Find handle in collection URL: `/collections/HANDLE` productHandles (Array) - Array of product handles or URLs (required when mode is `handles`) - Example: `["product-handle", "https://store.com/products/product-handle"]` - Can mix handles and full URLs includeVariants (Boolean) - Include detailed variant information - Default: `true` - Set to `false` to reduce output size includeImages (Boolean) - Include product image details - Default: `true` - Set to `false` to reduce output size maxProducts (Integer) - Maximum number of products to scrape - Default: Unlimited - Useful for testing or sampling maxConcurrency (Integer) - Number of concurrent requests - Default: `5` - Range: `1` to `20` - Higher values = faster scraping but more resource usage proxyConfiguration (Object) - Apify proxy configuration - Recommended for large-scale scraping - Example: `{"useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"]}` ## Input Examples ### Example 1: Scrape All Products `json { "storeDomain": "gymshark.com", "mode": "all" }` ### Example 2: Scrape Specific Collection `json { "storeDomain": "gymshark.com", "mode": "collection", "collectionHandle": "mens" }` ### Example 3: Scrape Specific Products `json { "storeDomain": "gymshark.com", "mode": "handles", "productHandles": [ "legacy-tshirt", "vital-seamless-leggings" ] }` ### Example 4: Optimized for Speed `json { "storeDomain": "bigstore.com", "mode": "all", "maxConcurrency": 15, "proxyConfiguration": { "useApifyProxy": true } }` ### Example 5: Testing Configuration `json { "storeDomain": "gymshark.com", "mode": "all", "maxProducts": 50, "includeVariants": false, "includeImages": false }` ## Output Format The actor outputs structured JSON data with comprehensive product information. ### Product Data Structure json { "url": "https://store.com/products/product-handle", "id": 1234567890, "title": "Product Name - Variant", "handle": "product-handle", "description": "<p>HTML description</p>", "descriptionText": "Plain text description", "vendor": "Brand Name", "productType": "Category", "tags": ["tag1", "tag2"], "price": 29.99, "priceMin": 29.99, "priceMax": 39.99, "priceVaries": true, "compareAtPrice": 49.99, "compareAtPriceMin": 49.99, "compareAtPriceMax": 49.99, "onSale": true, "available": true, "totalInventory": 500, "variantsCount": 8, "variants": [...], "options": [...], "imagesCount": 6, "images": [...], "featuredImage": "https://cdn.shopify.com/...", "createdAt": "2024-01-01T00:00:00Z", "updatedAt": "2024-11-12T10:00:00Z", "publishedAt": "2024-01-01T12:00:00Z", "scrapedAt": "2024-11-12T10:30:00Z" } ### Variant Information When `includeVariants` is `true`, each product includes detailed variant data: `json "variants": [ { "id": 9876543210, "title": "Small / Black", "price": 29.99, "compareAtPrice": 49.99, "sku": "PROD-SKU-001", "barcode": "123456789012", "inventoryQuantity": 100, "available": true, "option1": "Small", "option2": "Black", "option3": null, "weight": 0.2, "weightUnit": "kg", "requiresShipping": true, "taxable": true } ]` ### Product Options `json "options": [ { "name": "Size", "position": 1, "values": ["Small", "Medium", "Large", "XL"] }, { "name": "Color", "position": 2, "values": ["Black", "White", "Blue"] } ]` ### Image Information When `includeImages` is `true`: `json "images": [ { "id": 3333333333, "src": "https://cdn.shopify.com/s/files/1/xxxx/products/image.jpg", "alt": "Product Image Description", "width": 2048, "height": 2048, "position": 1 } ]` ## Performance Speed: - Average: 500-1000 products per minute - Depends on store response time and concurrency settings Accuracy: - 100% data accuracy using official API - No parsing errors or missing fields Reliability: - Automatic retry on failures with exponential backoff - Error handling for network issues and rate limits - Success rate: 99%+ Resource Usage: - Memory: Less than 512MB RAM for most jobs - Compute: Approximately 0.01 compute units per 1,000 products ## Pricing Cost Estimate: - Small store (100 products): ~$0.002 - Medium store (1,000 products): ~$0.02 - Large store (10,000 products): ~$0.20 - Enterprise (100,000 products): ~$2.00 Actual costs depend on compute time and proxy usage. ## How It Works This actor leverages Shopify's public JSON API endpoints available on all Shopify stores: API Endpoints Used: - `https://store.com/products.json` - Product listing with pagination - `https://store.com/products/handle.json` - Individual product details - `https://store.com/collections/handle/products.json` - Collection products Process Flow: 1. Domain Validation: Verifies the provided domain is a valid Shopify store 2. Mode Selection: Routes to appropriate scraping strategy (all/collection/handles) 3. Data Fetching: Makes requests to Shopify JSON endpoints with pagination 4. Data Processing: Normalizes and enriches product data 5. Output: Saves structured data to Apify dataset Technical Advantages: - No HTML parsing - direct JSON API access - No CSS selectors that break with theme updates - No authentication or API keys required - Works on any Shopify store regardless of plan or theme - Consistent data structure across all stores ## Best Practices ### For Large Stores (10,000+ products) 1. Enable proxy configuration to avoid rate limiting 2. Increase concurrency to 10-15 for faster scraping 3. Consider scraping specific collections instead of entire store 4. Use `maxProducts` parameter for initial testing ### For Regular Monitoring 1. Use `mode: "collection"` for specific categories 2. Schedule runs during off-peak hours 3. Store results in named datasets for comparison 4. Set up webhooks for automated processing ### For Data Quality 1. Keep `includeVariants: true` for complete inventory data 2. Enable `includeImages: true` for product catalogs 3. Use product handles for precise targeting 4. Verify store domain before large scraping jobs ## Troubleshooting ### Store Not Found Error Issue: "Domain does not appear to be a Shopify store" Solutions: - Verify the domain is correct (no typos) - Remove `https://` and paths from domain - Try without `www.` prefix - Ensure the store is publicly accessible (not password-protected) ### No Products Returned Issue: Actor completes but returns empty dataset Solutions: - Verify the store has published products - Check if collection handle is correct (try `mode: "all"` first) - Ensure products are not restricted by location/password - Check actor logs for specific error messages ### Slow Performance Issue: Actor takes longer than expected Solutions: - Increase `maxConcurrency` (up to 20) - Enable Apify proxy configuration - Reduce output size with `includeVariants: false` - Check if store has slow response times ### Incomplete Data Issue: Some products missing fields Solutions: - Some Shopify stores may not populate all fields - Check if `includeVariants` and `includeImages` are enabled - Verify the store's product data in Shopify admin - Review actor logs for parsing warnings ## Limitations Technical Limitations: - Only scrapes publicly accessible stores - Cannot access password-protected stores or products - Cannot bypass Shopify Plus wholesale portals - Limited by Shopify's public API availability Data Limitations: - Cannot access customer data or order information - Cannot retrieve draft or unpublished products - Cannot access admin-only product metadata - Inventory counts may be cached by Shopify Rate Limiting: - Respects Shopify's fair use guidelines - Implements polite crawling (1-2 requests/second) - Automatic backoff on rate limit responses - Proxy usage recommended for very large stores ## FAQ Q: Does this work on all Shopify stores? A: Yes, it works on any public Shopify store including custom domains and .myshopify.com stores. Q: Do I need API credentials or store access? A: No authentication required. This uses public JSON endpoints available on all Shopify stores. Q: Will I get blocked or rate limited? A: The actor implements polite crawling with automatic retries. For large-scale scraping, use Apify proxies. Q: How accurate is the data compared to HTML scraping? A: 100% accurate. Using official API eliminates parsing errors common with HTML scraping. Q: Can I scrape product reviews or customer data? A: No, this actor only accesses publicly available product catalog data. Q: How do I find collection handles? A: Visit the collection page in your browser. The handle is in the URL: `https://store.com/collections/HANDLE` Q: Can I scrape multiple stores in one run? A: No, configure one store per actor run. Use Apify tasks or schedules for multiple stores. Q: What happens if a product is deleted during scraping? A: The actor handles 404 errors gracefully and continues with remaining products. ## Related Actors Explore our complete Shopify scraping suite: - Shopify Price Monitor - Track price changes and sales over time - Shopify Inventory Tracker - Monitor stock levels and availability - Shopify Store Analyzer - Extract store metadata and analytics - Shopify Collection Scraper - Specialized collection-based extraction - *Shopify Feed Generator** - Generate product feeds for Google Shopping ## Legal Compliance This actor accesses only publicly available data from Shopify stores through official public API endpoints. It does not: - Require authentication or API keys - Circumvent access controls or security measures - Access password-protected or restricted content - Violate Shopify's Terms of Service The actor implements responsible scraping practices including rate limiting and respectful request patterns. Users are responsible for ensuring their use complies with applicable laws, data protection regulations, and the terms of service of stores they scrape.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Shopify Products Scraper Pro now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: n0rmaliz3
Pricing: Paid
Total Runs: 29
Active Users: 3

Related Actors

Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.

by invideoiq

Linkedin Profile Details Scraper + EMAIL (No Cookies Required)

by apimaestro

Twitter (X.com) Scraper Unlimited: No Limits

by apidojo

Content Checker

by jakubbalada

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support