Shopify Products Scraper Pro
by n0rmaliz3
Extract product data from any Shopify store using official JSON API. Get products, variants, prices, inventory, images, and metadata. No authenticatio...
Opens on Apify.com
About Shopify Products Scraper Pro
Extract product data from any Shopify store using official JSON API. Get products, variants, prices, inventory, images, and metadata. No authentication required. Fast, accurate, and cost-effective solution for e-commerce intelligence and competitor analysis.
What does this actor do?
Shopify Products Scraper Pro is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Shopify Products Scraper Pro Extract comprehensive product data from any Shopify store using the official Shopify JSON API. Fast, reliable, and cost-effective solution for e-commerce data extraction, competitor analysis, and market research. ## What Does This Actor Do? Shopify Products Scraper Pro extracts product information from Shopify stores without requiring authentication or API keys. It collects structured data including product details, variants, prices, inventory levels, images, and metadata - perfect for e-commerce intelligence, dropshipping, price monitoring, and market analysis. ## Why Choose This Scraper? This actor scrapes product information from Shopify stores without requiring authentication or API keys. It leverages Shopify's public JSON endpoints to extract structured data including products, variants, prices, inventory levels, images, and metadata. Key Features: - Uses official Shopify JSON API (not HTML scraping) - Works on any public Shopify store - No authentication required - High accuracy and reliability - Automatic pagination and retry logic - Respectful rate limiting ## Use Cases E-commerce Intelligence: - Competitor product analysis and pricing research - Market trend identification and category analysis - Product catalog monitoring and updates Business Operations: - Dropshipping supplier inventory tracking - Price comparison platform data collection - Product database enrichment and synchronization Market Research: - Industry product trends analysis - Vendor and brand comparison - Seasonal catalog changes tracking ## Input Configuration ### Required Parameters storeDomain (String) - The Shopify store domain to scrape - Example: gymshark.com or store.myshopify.com - Do not include https:// or paths ### Optional Parameters mode (String) - Scraping mode: all, collection, or handles - Default: all - all: Scrape all products from the store - collection: Scrape products from a specific collection - handles: Scrape specific products by handle collectionHandle (String) - Collection handle to scrape (required when mode is collection) - Example: mens, sale, new-arrivals - Find handle in collection URL: /collections/HANDLE productHandles (Array) - Array of product handles or URLs (required when mode is handles) - Example: ["product-handle", "https://store.com/products/product-handle"] - Can mix handles and full URLs includeVariants (Boolean) - Include detailed variant information - Default: true - Set to false to reduce output size includeImages (Boolean) - Include product image details - Default: true - Set to false to reduce output size maxProducts (Integer) - Maximum number of products to scrape - Default: Unlimited - Useful for testing or sampling maxConcurrency (Integer) - Number of concurrent requests - Default: 5 - Range: 1 to 20 - Higher values = faster scraping but more resource usage proxyConfiguration (Object) - Apify proxy configuration - Recommended for large-scale scraping - Example: {"useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"]} ## Input Examples ### Example 1: Scrape All Products json { "storeDomain": "gymshark.com", "mode": "all" } ### Example 2: Scrape Specific Collection json { "storeDomain": "gymshark.com", "mode": "collection", "collectionHandle": "mens" } ### Example 3: Scrape Specific Products json { "storeDomain": "gymshark.com", "mode": "handles", "productHandles": [ "legacy-tshirt", "vital-seamless-leggings" ] } ### Example 4: Optimized for Speed json { "storeDomain": "bigstore.com", "mode": "all", "maxConcurrency": 15, "proxyConfiguration": { "useApifyProxy": true } } ### Example 5: Testing Configuration json { "storeDomain": "gymshark.com", "mode": "all", "maxProducts": 50, "includeVariants": false, "includeImages": false } ## Output Format The actor outputs structured JSON data with comprehensive product information. ### Product Data Structure json { "url": "https://store.com/products/product-handle", "id": 1234567890, "title": "Product Name - Variant", "handle": "product-handle", "description": "<p>HTML description</p>", "descriptionText": "Plain text description", "vendor": "Brand Name", "productType": "Category", "tags": ["tag1", "tag2"], "price": 29.99, "priceMin": 29.99, "priceMax": 39.99, "priceVaries": true, "compareAtPrice": 49.99, "compareAtPriceMin": 49.99, "compareAtPriceMax": 49.99, "onSale": true, "available": true, "totalInventory": 500, "variantsCount": 8, "variants": [...], "options": [...], "imagesCount": 6, "images": [...], "featuredImage": "https://cdn.shopify.com/...", "createdAt": "2024-01-01T00:00:00Z", "updatedAt": "2024-11-12T10:00:00Z", "publishedAt": "2024-01-01T12:00:00Z", "scrapedAt": "2024-11-12T10:30:00Z" } ### Variant Information When includeVariants is true, each product includes detailed variant data: json "variants": [ { "id": 9876543210, "title": "Small / Black", "price": 29.99, "compareAtPrice": 49.99, "sku": "PROD-SKU-001", "barcode": "123456789012", "inventoryQuantity": 100, "available": true, "option1": "Small", "option2": "Black", "option3": null, "weight": 0.2, "weightUnit": "kg", "requiresShipping": true, "taxable": true } ] ### Product Options json "options": [ { "name": "Size", "position": 1, "values": ["Small", "Medium", "Large", "XL"] }, { "name": "Color", "position": 2, "values": ["Black", "White", "Blue"] } ] ### Image Information When includeImages is true: json "images": [ { "id": 3333333333, "src": "https://cdn.shopify.com/s/files/1/xxxx/products/image.jpg", "alt": "Product Image Description", "width": 2048, "height": 2048, "position": 1 } ] ## Performance Speed: - Average: 500-1000 products per minute - Depends on store response time and concurrency settings Accuracy: - 100% data accuracy using official API - No parsing errors or missing fields Reliability: - Automatic retry on failures with exponential backoff - Error handling for network issues and rate limits - Success rate: 99%+ Resource Usage: - Memory: Less than 512MB RAM for most jobs - Compute: Approximately 0.01 compute units per 1,000 products ## Pricing Cost Estimate: - Small store (100 products): ~$0.002 - Medium store (1,000 products): ~$0.02 - Large store (10,000 products): ~$0.20 - Enterprise (100,000 products): ~$2.00 Actual costs depend on compute time and proxy usage. ## How It Works This actor leverages Shopify's public JSON API endpoints available on all Shopify stores: API Endpoints Used: - https://store.com/products.json - Product listing with pagination - https://store.com/products/handle.json - Individual product details - https://store.com/collections/handle/products.json - Collection products Process Flow: 1. Domain Validation: Verifies the provided domain is a valid Shopify store 2. Mode Selection: Routes to appropriate scraping strategy (all/collection/handles) 3. Data Fetching: Makes requests to Shopify JSON endpoints with pagination 4. Data Processing: Normalizes and enriches product data 5. Output: Saves structured data to Apify dataset Technical Advantages: - No HTML parsing - direct JSON API access - No CSS selectors that break with theme updates - No authentication or API keys required - Works on any Shopify store regardless of plan or theme - Consistent data structure across all stores ## Best Practices ### For Large Stores (10,000+ products) 1. Enable proxy configuration to avoid rate limiting 2. Increase concurrency to 10-15 for faster scraping 3. Consider scraping specific collections instead of entire store 4. Use maxProducts parameter for initial testing ### For Regular Monitoring 1. Use mode: "collection" for specific categories 2. Schedule runs during off-peak hours 3. Store results in named datasets for comparison 4. Set up webhooks for automated processing ### For Data Quality 1. Keep includeVariants: true for complete inventory data 2. Enable includeImages: true for product catalogs 3. Use product handles for precise targeting 4. Verify store domain before large scraping jobs ## Troubleshooting ### Store Not Found Error Issue: "Domain does not appear to be a Shopify store" Solutions: - Verify the domain is correct (no typos) - Remove https:// and paths from domain - Try without www. prefix - Ensure the store is publicly accessible (not password-protected) ### No Products Returned Issue: Actor completes but returns empty dataset Solutions: - Verify the store has published products - Check if collection handle is correct (try mode: "all" first) - Ensure products are not restricted by location/password - Check actor logs for specific error messages ### Slow Performance Issue: Actor takes longer than expected Solutions: - Increase maxConcurrency (up to 20) - Enable Apify proxy configuration - Reduce output size with includeVariants: false - Check if store has slow response times ### Incomplete Data Issue: Some products missing fields Solutions: - Some Shopify stores may not populate all fields - Check if includeVariants and includeImages are enabled - Verify the store's product data in Shopify admin - Review actor logs for parsing warnings ## Limitations Technical Limitations: - Only scrapes publicly accessible stores - Cannot access password-protected stores or products - Cannot bypass Shopify Plus wholesale portals - Limited by Shopify's public API availability Data Limitations: - Cannot access customer data or order information - Cannot retrieve draft or unpublished products - Cannot access admin-only product metadata - Inventory counts may be cached by Shopify Rate Limiting: - Respects Shopify's fair use guidelines - Implements polite crawling (1-2 requests/second) - Automatic backoff on rate limit responses - Proxy usage recommended for very large stores ## FAQ Q: Does this work on all Shopify stores? A: Yes, it works on any public Shopify store including custom domains and .myshopify.com stores. Q: Do I need API credentials or store access? A: No authentication required. This uses public JSON endpoints available on all Shopify stores. Q: Will I get blocked or rate limited? A: The actor implements polite crawling with automatic retries. For large-scale scraping, use Apify proxies. Q: How accurate is the data compared to HTML scraping? A: 100% accurate. Using official API eliminates parsing errors common with HTML scraping. Q: Can I scrape product reviews or customer data? A: No, this actor only accesses publicly available product catalog data. Q: How do I find collection handles? A: Visit the collection page in your browser. The handle is in the URL: https://store.com/collections/HANDLE Q: Can I scrape multiple stores in one run? A: No, configure one store per actor run. Use Apify tasks or schedules for multiple stores. Q: What happens if a product is deleted during scraping? A: The actor handles 404 errors gracefully and continues with remaining products. ## Related Actors Explore our complete Shopify scraping suite: - Shopify Price Monitor - Track price changes and sales over time - Shopify Inventory Tracker - Monitor stock levels and availability - Shopify Store Analyzer - Extract store metadata and analytics - Shopify Collection Scraper - Specialized collection-based extraction - Shopify Feed Generator* - Generate product feeds for Google Shopping ## Legal Compliance This actor accesses only publicly available data from Shopify stores through official public API endpoints. It does not: - Require authentication or API keys - Circumvent access controls or security measures - Access password-protected or restricted content - Violate Shopify's Terms of Service The actor implements responsible scraping practices including rate limiting and respectful request patterns. Users are responsible for ensuring their use complies with applicable laws, data protection regulations, and the terms of service of stores they scrape.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Shopify Products Scraper Pro now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- n0rmaliz3
- Pricing
- Paid
- Total Runs
- 29
- Active Users
- 3
Related Actors
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Linkedin Profile Details Scraper + EMAIL (No Cookies Required)
by apimaestro
Twitter (X.com) Scraper Unlimited: No Limits
by apidojo
Content Checker
by jakubbalada
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support