Producthunt Scraper

Name: Producthunt Scraper
Author: runtime

by runtime

A web scraper that extracts comprehensive product information from Product Hunt using Apify.

604 runs

63 users

Try This Actor

Opens on Apify.com

About Producthunt Scraper

A web scraper that extracts comprehensive product information from Product Hunt using Apify.

What does this actor do?

Producthunt Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Product Hunt Scraper A web scraper that extracts comprehensive product information from Product Hunt using Apify and Crawlee. ## Quick Start 1. Input: Configure your scraping parameters in the input field 2. Run: Click "Start" to begin scraping 3. Output: Download results from the Dataset tab ## Input Configuration > Note: If you provide a Start Date (and/or End Date), the Start URLs field will be ignored. Only one method (date range OR Start URLs) will be used per run. ### Basic Configuration `json { "startUrls": ["https://www.producthunt.com/"], "maxRequestRetries": 3, "maxConcurrency": 5, "maxRequestsPerCrawl": 100, "scrapeComingSoon": true }` ### Daily Leaderboard Scraping `json { "startUrls": ["https://www.producthunt.com/leaderboard/daily/2025/7/5/all"], "scrapeDailyLeaderboard": true, "maxRequestsPerCrawl": 50 }` ### Input Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `startUrls` | Array | `["https://www.producthunt.com/"]` | Starting URLs for scraping | | `maxResults` | Number | `8` | Maximum number of product detail pages to process per run. Lower values reduce timeout risk. | | `maxRequestRetries` | Number | `3` | Maximum retry attempts for failed requests | | `maxConcurrency` | Number | `5` | Number of concurrent requests | | `maxRequestsPerCrawl` | Number | `100` | Maximum pages to crawl | | `timeoutSecs` | Number | `900` | Maximum runtime (seconds) before the Actor stops automatically | | `useApifyProxy` | Boolean | `true` | Simple checkbox to enable/disable Apify Proxy without touching advanced settings | | `proxyConfiguration` | Object | `{}` | Advanced proxy editor (choose Apify Proxy groups, custom proxy URLs, or no proxy). When left empty it inherits the `useApifyProxy` toggle. | | `scrapeComingSoon` | Boolean | `true` | Whether to scrape "Coming Soon" products | | `scrapeDailyLeaderboard` | Boolean | `false` | Whether to scrape daily leaderboard | | `sortByDate` | Boolean | `false` | Whether to sort output by date | | `sortOrder` | String | `"desc"` | Sort order: `"asc"` (oldest first) or `"desc"` (newest first) | | `startDate` | String | `null` | Start date for daily leaderboard range (format: `YYYY-MM-DD` or `YYYY/MM/DD`) | | `endDate` | String | `null` | End date for daily leaderboard range (defaults to yesterday if not specified) | | `maxCommentPages` | Number | `0` | Maximum number of comment pages to scrape per product (0-10). Comments are paginated on the main product page. The scraper will automatically stop if no more pages are available. | ## Input Method Exclusivity - If you provide a Start Date (and/or End Date), the Start URLs field will be ignored. - If you do not provide a Start Date, the scraper will use the Start URLs. - If neither is provided, the scraper will default to scraping the daily leaderboard for yesterday (today - 1 day). ### Examples Default (no input, scrapes yesterday's leaderboard): `json {}` // Will scrape the daily leaderboard for yesterday Date range only (Start URLs ignored): `json { "startDate": "2025-07-01", "endDate": "2025-07-05" }` Start URLs only (date fields empty): `json { "startUrls": [ "https://www.producthunt.com/leaderboard/daily/2025/7/4/all" ] }` Both provided (date range takes precedence): `json { "startDate": "2025-07-01", "endDate": "2025-07-01", "startUrls": [ "https://www.producthunt.com/leaderboard/daily/2025/7/4/all" ] } // Only 2025-07-01 will be scraped` ## Output Format ### Regular Products json { "name": "Product Name", "tagline": "Product tagline", "description": "Detailed description", "upvotes": 1234, "categories": ["SaaS", "Productivity"], "launchDate": "Launch date", "imageUrl": "https://example.com/image.jpg", "productUrl": "https://product-website.com", "socialLinks": [ { "platform": "twitter", "url": "https://twitter.com/product" } ], "scrapedAt": "2024-01-01T12:00:00.000Z", "companyWebsite": "https://company.com", "productHuntUrl": "https://www.producthunt.com/products/product-name", "makers": [ { "username": "maker1", "name": "Maker One", "roles": ["Maker"] }, { "username": "maker2", "name": "Maker Two", "roles": ["Maker", "Hunter"] } ], "hunter": { "username": "hunter1", "name": "Hunter Name" }, "builtWith": [ { "name": "shadcn/ui", "url": "https://www.producthunt.com/products/shadcn-ui", "description": "Beautifully designed components.", "imageUrl": "https://ph-files.imgix.net/..." } ], "launches": [ { "postId": "1038207", "title": "Product Name", "url": "https://www.producthunt.com/products/product-name/launches/product-name", "tagline": "Product tagline", "date": "November 15th, 2025", "rank": 1, "upvotes": 308, "comments": 43, "imageUrl": "https://example.com/image.jpg", "imageAlt": "Product Name" } ], "launchCount": 1, "comments": [ { "id": "5002031", "username": "adrm", "userName": "Adrián de la Rosa", "userUrl": "https://www.producthunt.com/@adrm", "userAvatar": "https://ph-avatars.imgix.net/...", "isMaker": true, "text": "Comment text content...", "html": "<div>Comment HTML content...</div>", "upvotes": 33, "timestamp": "2025-11-14T09:06:07-08:00", "timeAgo": "2d ago" } ], "commentCount": 43, "reviews": [], "reviewCount": 0 } ### Daily Leaderboard Products json { "name": "Product Name", "tagline": "Product description", "categories": ["SaaS", "Productivity"], "upvotes": "1234", "launchDate": "July 5, 2025", "imageUrl": "https://example.com/image.jpg", "productUrl": "https://www.producthunt.com/products/product-name", "scrapedFrom": "daily-leaderboard", "scrapedAt": "2024-01-01T12:00:00.000Z", "productHuntUrl": "https://www.producthunt.com/leaderboard/daily/2025/7/5/all", "makers": [ { "username": "maker1", "name": "Maker One", "roles": ["Maker"] } ], "hunter": { "username": "hunter1", "name": "Hunter Name" } } ### Team Extraction Output - makers: Array of all team members with the "Maker" role. Each object contains: - `username`: Product Hunt username (string) - `name`: Display name (string) - `roles`: Array of roles (e.g. `["Maker"]` or `["Hunter", "Maker"]`) - `title`: Job title or role description (string, optional) - hunter: Object with the first team member who has the "Hunter" role, with: - `username`: Product Hunt username (string) - `name`: Display name (string) - If no hunter is found, this field is `null`. - Note: A hunter can also be a maker (they will appear in both `makers` and `hunter` fields). ### Additional Data Fields - builtWith: Array of tools/products used to build the product. Each object contains: - `name`: Tool/product name (string) - `url`: Product Hunt URL (string) - `description`: Tool description (string) - `imageUrl`: Tool thumbnail image URL (string, optional) - launches: Array of all launches for the product. Each object contains: - `postId`: Launch post ID (string) - `title`: Launch title (string) - `url`: Launch URL (string) - `tagline`: Launch tagline (string) - `date`: Launch date (string) - `rank`: Daily rank (number) - `upvotes`: Number of upvotes (number) - `comments`: Number of comments (number) - `imageUrl`: Launch image URL (string, optional) - `imageAlt`: Image alt text (string, optional) - launchCount: Total number of launches (number) - comments: Array of comments from the main product page. Each object contains: - `id`: Comment ID (string) - `username`: Commenter's username (string) - `userName`: Commenter's display name (string) - `userUrl`: Commenter's profile URL (string) - `userAvatar`: Commenter's avatar URL (string, optional) - `isMaker`: Whether the commenter is a maker (boolean) - `text`: Comment text content (string) - `html`: Comment HTML content (string) - `upvotes`: Number of upvotes (number) - `timestamp`: ISO timestamp (string) - `timeAgo`: Human-readable time (string, e.g. "2d ago") - commentCount: Total number of comments extracted (number) - reviews: Array of reviews from the `/reviews` page (same structure as comments). Often empty as reviews are less common. - reviewCount: Total number of reviews extracted (number) ### Comment Pagination Comments are paginated on Product Hunt. The scraper supports pagination through the `maxCommentPages` parameter: - Default: `0` (skip comment extraction for fastest runs) - Range: `0-10` (set to `0` to disable comment extraction) - Behavior: - The scraper visits up to `maxCommentPages` pages of comments - It automatically stops if no more pages are available (even if `maxCommentPages` is higher) - Each page is visited sequentially: `?page=1#comments`, `?page=2#comments`, etc. - All comments from all pages are combined into a single `comments` array Example with pagination: `json { "maxCommentPages": 5 }` This will scrape up to 5 pages of comments per product, stopping early if fewer pages are available. ## Usage Examples ### Scrape Today's Products `json { "startUrls": ["https://www.producthunt.com/"], "maxRequestsPerCrawl": 50 }` ### Scrape Daily Leaderboard `json { "startUrls": ["https://www.producthunt.com/leaderboard/daily/2025/7/5/all"], "scrapeDailyLeaderboard": true }` ### Scrape Coming Soon Products `json { "startUrls": ["https://www.producthunt.com/coming-soon"], "scrapeComingSoon": true }` ### Scrape Specific Categories `json { "startUrls": [ "https://www.producthunt.com/categories/developer-tools", "https://www.producthunt.com/categories/productivity" ] }` ### Scrape with Date Sorting `json { "startUrls": ["https://www.producthunt.com/"], "sortByDate": true, "sortOrder": "desc" }` ### Scrape with Comment Pagination `json { "startUrls": ["https://www.producthunt.com/products/product-name"], "maxCommentPages": 5 }` This will scrape up to 5 pages of comments per product. ### Scrape Daily Leaderboard with Date Sorting (Oldest First) `json { "startUrls": ["https://www.producthunt.com/leaderboard/daily/2025/7/5/all"], "scrapeDailyLeaderboard": true, "sortByDate": true, "sortOrder": "asc" }` ### Scrape Daily Leaderboard Date Range `json { "startDate": "2025-07-01", "endDate": "2025-07-05", "scrapeDailyLeaderboard": true, "maxRequestsPerCrawl": 200 }` ### Scrape Daily Leaderboard from Date to Yesterday `json { "startDate": "2025-07-01", "scrapeDailyLeaderboard": true, "sortByDate": true, "sortOrder": "desc" }` ### Combine Custom URLs with Date Range `json { "startUrls": [ "https://www.producthunt.com/leaderboard/daily/2025/7/5/all", "https://www.producthunt.com/leaderboard/daily/2025/7/4/all" ], "startDate": "2025-07-01", "endDate": "2025-07-03", "scrapeDailyLeaderboard": true }` ## Data Fields | Field | Description | Available For | |-------|-------------|---------------| | `name` | Product name | All products | | `tagline` | Short product description | All products | | `description` | Detailed product description | Regular products | | `upvotes` | Number of upvotes | All products | | `categories` | Array of product categories | All products | | `makers` | Array of makers with roles | All products (when `getDetails` is true) | | `hunter` | Hunter information | All products (when `getDetails` is true) | | `launchDate` | Product launch date | All products | | `imageUrl` | Product image URL | All products | | `productUrl` | Direct product website link | All products | | `pricing` | Pricing information | Regular products | | `metaKeywords` | Meta keywords | Regular products | | `socialLinks` | Social media links | Regular products | | `builtWith` | Tools used to build the product | Regular products (when `getDetails` is true) | | `launches` | Array of all product launches | Regular products (when `getDetails` is true) | | `launchCount` | Total number of launches | Regular products (when `getDetails` is true) | | `comments` | Array of comments from main page | Regular products (when `getDetails` is true and `maxCommentPages` > 0) | | `commentCount` | Total number of comments | Regular products (when `getDetails` is true and `maxCommentPages` > 0) | | `reviews` | Array of reviews from `/reviews` page | Regular products (when `getDetails` is true) | | `reviewCount` | Total number of reviews | Regular products (when `getDetails` is true) | | `scrapedFrom` | Data source identifier | All products | | `scrapedAt` | Timestamp of scraping | All products | | `sourceUrl` | Original Product Hunt URL | All products | ## Performance Tips - Concurrency: Increase `maxConcurrency` for faster scraping (be mindful of rate limits) - Retries: Higher `maxRequestRetries` values improve reliability but slow down scraping - Request Limits: Adjust `maxRequestsPerCrawl` based on your needs - Proxy: Use the `useApifyProxy` checkbox for quick on/off control, or fill `proxyConfiguration` for custom pools/URLs. Camoufox-based Apify Proxy is enabled by default for anti-detection. - Sorting: Enable `sortByDate` to get chronologically ordered results (adds processing time for large datasets) - Date Ranges: Large date ranges will generate many URLs; increase `maxRequestsPerCrawl` accordingly - Comment Pagination: Each comment page adds ~2-3 seconds per product. The default `maxCommentPages=0` skips comments entirely for maximum speed; raise it to collect discussion threads at the cost of additional time. - Product Details: Set `getDetails` to `false` to skip detailed product pages and scrape only leaderboard summaries (much faster) ## Troubleshooting ### Common Issues 1. No products found: Product Hunt may have changed their HTML structure 2. Rate limiting: Reduce `maxConcurrency` or add delays 3. Missing data: Some products may not have all fields available 4. Daily leaderboard issues: Check if the URL format is correct ### Debug Mode Enable debug logging by checking the actor logs in the Apify console. ## Legal Notice - Respect Product Hunt's robots.txt and terms of service - Use reasonable request rates - Use scraped data responsibly and in accordance with applicable laws - This scraper is for educational and research purposes ## Support For issues and questions: 1. Check the troubleshooting section 2. Review the actor logs 3. Contact Apify support --- Note: Always respect website terms of service and use data responsibly. ## Related Actors - CNN Top Headlines Scraper Actor: Scrape the latest top news headlines and full article details from CNN.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Producthunt Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: runtime
Pricing: Paid
Total Runs: 604
Active Users: 63

Related Actors

🏯 Tweet Scraper V2 - X / Twitter Scraper

by apidojo

Google Search Results Scraper

by apify

Instagram Profile Scraper

by apify

Tweet Scraper|$0.25/1K Tweets | Pay-Per Result | No Rate Limits

by kaitoeasyapi

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support