Hacker News Scraper & API - Export Stories, Comments, Data

Name: Hacker News Scraper & API - Export Stories, Comments, Data
Author: fresh_cliff

by fresh_cliff

Extract top stories, trending posts, points, comments & authors from Hacker News front page. Real-time data export to JSON/CSV. Monitor tech trends, a...

27 runs

3 users

Try This Actor

Opens on Apify.com

About Hacker News Scraper & API - Export Stories, Comments, Data

Extract top stories, trending posts, points, comments & authors from Hacker News front page. Real-time data export to JSON/CSV. Monitor tech trends, analyze viral content, track HN activity. Fast Playwright scraper.

What does this actor do?

Hacker News Scraper & API - Export Stories, Comments, Data is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Hacker News Scraper for Apify A production-ready Apify actor that scrapes stories from Hacker News front page using Playwright. ## 🚀 Features - Scrapes Hacker News front page stories - Extracts comprehensive story data: - Title and URL - Points (upvotes) - Author username - Number of comments - Time posted - Story rank - Hacker News discussion URL - Configurable number of stories to scrape - Option to include/exclude job posts - Built with Playwright for reliable scraping - Production-ready for Apify platform ## 📁 Project Structure `hackernews-scraper/ ├── .actor/ │ ├── actor.json # Actor metadata and configuration │ └── dataset_schema.json # Output data schema ├── apify_actor.py # Main actor entry point ├── hackernews_scraper.py # Core scraper implementation ├── Dockerfile # Docker configuration for Apify ├── requirements.txt # Python dependencies ├── INPUT_SCHEMA.json # Input configuration schema └── README.md # This file` ## 🔧 Local Testing ### Prerequisites - Python 3.11+ - pip ### Installation 1. Install dependencies: `bash pip install -r requirements.txt` 2. Install Playwright browsers: `bash playwright install chromium` 3. Test the scraper locally: `bash python hackernews_scraper.py` ## 🌐 Deploy to Apify ### Prerequisites 1. Create an Apify account 2. Install Apify CLI: `npm install -g apify-cli` 3. Login: `apify login` ### Deployment Steps 1. Navigate to project directory: `bash cd hackernews-scraper` 2. Deploy to Apify: `bash apify push` 3. Access your actor at Apify Console ### Running on Apify 1. Navigate to your actor in the Apify Console 2. Click "Run" 3. Configure input options (optional) 4. Click "Start" to run the actor 5. View results in the "Dataset" tab ## ⚙️ Input Configuration | Field | Type | Default | Description | |-------|------|---------|-------------| | `maxStories` | integer | 30 | Maximum number of stories to scrape (1-100) | | `includeJobPosts` | boolean | false | Include "Who is hiring?" job posts | ### Example Input `json { "maxStories": 30, "includeJobPosts": false }` ## 📊 Output Format Each story is returned as a JSON object with the following structure: `json { "rank": 1, "title": "Show HN: I built a tool for...", "url": "https://example.com/article", "points": 342, "author": "username", "comments": 127, "timeAgo": "2024-01-15T10:30:00.000Z", "hackerNewsUrl": "https://news.ycombinator.com/item?id=12345678" }` ### Output Fields | Field | Type | Description | |-------|------|-------------| | `rank` | number | Story position on front page | | `title` | string | Story title | | `url` | string | Link to the story/article | | `points` | number | Number of upvotes | | `author` | string | Username who posted the story | | `comments` | number | Number of comments | | `timeAgo` | string | Timestamp when story was posted | | `hackerNewsUrl` | string | URL to Hacker News discussion | ## 🛠️ Built With - Python 3.11 - Programming language - Playwright - Browser automation - Apify SDK - Actor framework - Following Apify best practices and patterns ## 📝 Use Cases - Monitor trending tech stories - Track specific topics on HN - Build custom HN readers/aggregators - Research what content performs well - Create HN analytics dashboards ## 🔒 Rate Limiting The scraper is designed to be respectful of Hacker News: - Single page load per run - No aggressive pagination - Configurable limits on stories scraped ## 📄 License This actor is provided as-is for use on the Apify platform. ## 🤝 Support For issues or questions: - Check the Apify documentation - Open an issue in the repository - Contact via Apify platform --- Ready to deploy in under 10 minutes! 🎉

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Hacker News Scraper & API - Export Stories, Comments, Data now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: fresh_cliff
Pricing: Paid
Total Runs: 27
Active Users: 3

Related Actors

Smart Article Extractor

by lukaskrivka

Google Search

by devisty

Twitter Tweets Scraper

by gentle_cloud

Twitter Profile

by danek

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support