Hacker News Scraper & API - Export Stories, Comments, Data
by fresh_cliff
Extract top stories, trending posts, points, comments & authors from Hacker News front page. Real-time data export to JSON/CSV. Monitor tech trends, a...
Opens on Apify.com
About Hacker News Scraper & API - Export Stories, Comments, Data
Extract top stories, trending posts, points, comments & authors from Hacker News front page. Real-time data export to JSON/CSV. Monitor tech trends, analyze viral content, track HN activity. Fast Playwright scraper.
What does this actor do?
Hacker News Scraper & API - Export Stories, Comments, Data is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Hacker News Scraper for Apify A production-ready Apify actor that scrapes stories from Hacker News front page using Playwright. ## 🚀 Features - Scrapes Hacker News front page stories - Extracts comprehensive story data: - Title and URL - Points (upvotes) - Author username - Number of comments - Time posted - Story rank - Hacker News discussion URL - Configurable number of stories to scrape - Option to include/exclude job posts - Built with Playwright for reliable scraping - Production-ready for Apify platform ## 📁 Project Structure hackernews-scraper/ ├── .actor/ │ ├── actor.json # Actor metadata and configuration │ └── dataset_schema.json # Output data schema ├── apify_actor.py # Main actor entry point ├── hackernews_scraper.py # Core scraper implementation ├── Dockerfile # Docker configuration for Apify ├── requirements.txt # Python dependencies ├── INPUT_SCHEMA.json # Input configuration schema └── README.md # This file ## 🔧 Local Testing ### Prerequisites - Python 3.11+ - pip ### Installation 1. Install dependencies: bash pip install -r requirements.txt 2. Install Playwright browsers: bash playwright install chromium 3. Test the scraper locally: bash python hackernews_scraper.py ## 🌐 Deploy to Apify ### Prerequisites 1. Create an Apify account 2. Install Apify CLI: npm install -g apify-cli 3. Login: apify login ### Deployment Steps 1. Navigate to project directory: bash cd hackernews-scraper 2. Deploy to Apify: bash apify push 3. Access your actor at Apify Console ### Running on Apify 1. Navigate to your actor in the Apify Console 2. Click "Run" 3. Configure input options (optional) 4. Click "Start" to run the actor 5. View results in the "Dataset" tab ## ⚙️ Input Configuration | Field | Type | Default | Description | |-------|------|---------|-------------| | maxStories | integer | 30 | Maximum number of stories to scrape (1-100) | | includeJobPosts | boolean | false | Include "Who is hiring?" job posts | ### Example Input json { "maxStories": 30, "includeJobPosts": false } ## 📊 Output Format Each story is returned as a JSON object with the following structure: json { "rank": 1, "title": "Show HN: I built a tool for...", "url": "https://example.com/article", "points": 342, "author": "username", "comments": 127, "timeAgo": "2024-01-15T10:30:00.000Z", "hackerNewsUrl": "https://news.ycombinator.com/item?id=12345678" } ### Output Fields | Field | Type | Description | |-------|------|-------------| | rank | number | Story position on front page | | title | string | Story title | | url | string | Link to the story/article | | points | number | Number of upvotes | | author | string | Username who posted the story | | comments | number | Number of comments | | timeAgo | string | Timestamp when story was posted | | hackerNewsUrl | string | URL to Hacker News discussion | ## 🛠️ Built With - Python 3.11 - Programming language - Playwright - Browser automation - Apify SDK - Actor framework - Following Apify best practices and patterns ## 📝 Use Cases - Monitor trending tech stories - Track specific topics on HN - Build custom HN readers/aggregators - Research what content performs well - Create HN analytics dashboards ## 🔒 Rate Limiting The scraper is designed to be respectful of Hacker News: - Single page load per run - No aggressive pagination - Configurable limits on stories scraped ## 📄 License This actor is provided as-is for use on the Apify platform. ## 🤝 Support For issues or questions: - Check the Apify documentation - Open an issue in the repository - Contact via Apify platform --- Ready to deploy in under 10 minutes! 🎉
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Hacker News Scraper & API - Export Stories, Comments, Data now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- fresh_cliff
- Pricing
- Paid
- Total Runs
- 27
- Active Users
- 3
Related Actors
Smart Article Extractor
by lukaskrivka
Google Search
by devisty
Twitter Tweets Scraper
by gentle_cloud
Twitter Profile
by danek
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support