Hackernews Job Scraper
by kutaui
Automatically scrapes and extracts structured job listings from Hacker News 'Who is hiring?' monthly posts. Uses Algolia search to find recent posts, ...
Opens on Apify.com
About Hackernews Job Scraper
Automatically scrapes and extracts structured job listings from Hacker News 'Who is hiring?' monthly posts. Uses Algolia search to find recent posts, fetches job comments from the Hacker News API, and leverages OpenAI to parse unstructured job postings into structured data.
What does this actor do?
Hackernews Job Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Hacker News Job Scraper Scrapes and parses job listings from Hacker News "Who is hiring?" posts using Algolia search and OpenAI for structured extraction. ## Description This Apify actor automates the process of discovering and extracting structured job listings from Hacker News "Who is hiring?" monthly threads. The actor leverages Algolia's search API to find recent hiring posts, then uses OpenAI's language models to transform unstructured job postings into clean, standardized data suitable for job boards, recruitment platforms, or market analysis. The actor begins by querying Algolia's search index for "Ask HN: Who is hiring?" posts, filtering results to include only posts from the specified time period. This ensures users can focus on recent opportunities while maintaining flexibility to adjust the lookback window. Once the most recent post is identified, the actor fetches all associated job comment threads from the Hacker News API, processing each comment as a potential job posting. A critical component of the actor is its text cleaning pipeline, which removes HTML entities, formatting artifacts, and extraneous whitespace from raw Hacker News comment text. This preprocessing step significantly improves the quality of data extraction by presenting clean text to OpenAI's models. The extraction process uses structured prompts to identify key job attributes including company names, job titles, locations, employment types, salary information, work arrangements, and application URLs. The actor is designed with reliability and efficiency in mind. It processes jobs sequentially to respect API rate limits while implementing robust error handling that allows individual job extraction failures to be logged without halting the entire process. Configurable parameters enable users to control the scope of scraping through date ranges and maximum job limits, making it suitable for both one-time data collection and ongoing monitoring of new opportunities. The output is structured JSON data ready for integration with job aggregation platforms, applicant tracking systems, or custom analytics dashboards, making it ideal for recruiters tracking tech job markets, researchers analyzing hiring trends, or developers building job search applications. ## What it does 1. Searches Algolia API for "Ask HN: Who is hiring?" posts 2. Filters posts from the last N days (default: 30) 3. Fetches the latest post and all job comment replies from Hacker News API 4. Cleans HTML entities and formatting from job postings 5. Extracts structured data using OpenAI (company, title, location, type, salary, description, URLs) 6. Outputs structured JSON to Apify dataset ## Input - algoliaApiKey (required): Algolia API key from hn.algolia.com network requests - openAiApiKey (required): OpenAI API key for data extraction - model (optional): OpenAI model, default gpt-4o-mini - daysBack (optional): Days to look back, default 30 - maxJobs (optional): Max jobs to process, default 100 ## Output Each job listing includes: - company, title, location, type, work_location, salary, description, apply_url, company_url - Extracted job data - jobId - Hacker News comment ID - postId - Hacker News post ID - postTitle - Post title - postDate - Post creation date - rawText - Cleaned job posting text - extractedAt - Extraction timestamp
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Hackernews Job Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- kutaui
- Pricing
- Paid
- Total Runs
- 2
- Active Users
- 1
Related Actors
Company Employees Scraper
by build_matrix
🔥 LinkedIn Jobs Scraper
by bebity
Linkedin Company Detail (No Cookies)
by apimaestro
Linkedin Profile Details Batch Scraper + EMAIL (No Cookies)
by apimaestro
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support