Jobs Scrapper

Name: Jobs Scrapper
Author: ai-scraper-labs

by ai-scraper-labs

Powerful AmbitionBox Job Scraper that extracts detailed job listings by role and location. Includes responsibilities, skills, qualifications, company ...

47 runs

2 users

Try This Actor

Opens on Apify.com

About Jobs Scrapper

Powerful AmbitionBox Job Scraper that extracts detailed job listings by role and location. Includes responsibilities, skills, qualifications, company insights, and Naukri integration for technical details. Fast, structured, and proxy-supported for large-scale data collection.

What does this actor do?

Jobs Scrapper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

AmbitionBox Job Scraper > ✨ Converted to Node.js + Playwright + Apify Actor > This project has been migrated from Python/Scrapy to Node.js/Playwright while preserving 100% of the original scraping logic. > See README_CONVERSION.md for conversion details. An Apify Actor that scrapes job listings from AmbitionBox using Playwright browser automation, with optional detailed information extraction from Naukri job pages. ## Features - AmbitionBox Job Scraping - Extracts comprehensive job listings including: - Job title, company, location, salary, and experience requirements - Detailed job descriptions and responsibilities - Required skills and qualifications - Employment type and application links - Naukri Detail Extraction - Optionally fetches detailed job information from linked Naukri pages: - Key responsibilities (structured list) - Required skills and technologies - Educational qualifications and experience requirements - Detailed job descriptions - Company Information - Extracts comprehensive company details: - Company overview and summary - Founding year and employee count - Company website and headquarters - Work policies (WFH, hybrid, etc.) - Complete benefits and perks list - Apify Platform Native - Built as a first-class Apify Actor: - Automatic request scheduling with AutoscaledPool - Built-in retry logic and error handling - Cloud-persisted request queue - Integrated dataset storage - Playwright Integration - Uses browser rendering to bypass anti-bot detection: - Handles JavaScript-rendered content - Bypasses AmbitionBox's anti-scraping measures - Chromium headless browser - Ensures reliable data extraction - Concurrency Support - Parallel processing for faster scraping: - Configurable concurrency (1-10 workers) - Request queue management - Automatic rate limiting ## Input Parameters Configure the scraper through the Actor input: - role (required, string) - Job role or title to search for - Example: "software engineer", "python developer", "data scientist" - location (optional, string) - Location to search jobs in - Example: "bangalore", "mumbai", "delhi" - Use "all" or "worldwide" for all locations - Note: AmbitionBox primarily lists jobs in India - If a specific location returns no results, automatically falls back to all locations - maxPages (optional, integer, default: 2) - Maximum number of listing pages to scrape - Range: 1-50 - maxJobs (optional, integer, default: 20) - Maximum number of jobs to scrape - Set to 0 for unlimited - includeNaukriDetails (optional, boolean, default: true) - Whether to fetch detailed information from Naukri - `true`: Comprehensive data (slower) - `false`: Basic data only (3-4x faster) - proxyConfiguration (optional, object) - Proxy settings for the scraper - Default: Uses Apify proxy with RESIDENTIAL group (recommended) - Recommended: Use residential proxies for best success rates with AmbitionBox - Datacenter proxies may experience higher timeout rates ## Output Format The Actor stores data in the default Apify dataset with this structure: json { "title": "Senior Software Engineer", "company": "Tech Company Pvt Ltd", "location": "Bangalore, Karnataka", "exp_level": "3-6 years", "salary_range": "₹10-18 LPA", "url": "https://www.ambitionbox.com/jobs/...", "apply_url": "https://www.naukri.com/job-listings-...", "about_this_role": "Full job description text...", "key_responsibility": [ "Design and develop scalable backend systems.", "Collaborate with cross-functional teams to define features.", "Ensure code quality through reviews and testing." ], "required_skills": [ "Python", "Django", "AWS", "SQL", "Docker" ], "required_qualifications": [ "Bachelor's degree in Computer Science or related field.", "3+ years of experience in backend development." ], "benefits_perks": [ "Health Insurance", "Work From Home", "Flexible Hours", "Learning & Development", "Paid Time Off" ], "company_info": { "name": "Tech Company Pvt Ltd", "Founded in": "2015", "Global Employee Count": "500-1000", "Website": "https://techcompany.com", "company_summary": "Leading technology company specializing in...", "work_policy": "Hybrid: 3 days WFO, Remote: 2 days WFH" }, "job_type": "Full-time" } ## How It Works 1. URL Construction - Builds AmbitionBox search URL from role and location parameters 2. Listing Extraction - Scrapes job listing pages using Scrapy's efficient crawling 3. Detail Parsing - For each job, extracts comprehensive information from detail pages 4. Naukri Integration - If enabled, follows "Apply on Naukri" links for additional details 5. Company Data - Fetches company overview and benefits from dedicated pages 6. Data Storage - Stores all structured data in Apify dataset ## Technologies Used - Scrapy - Fast, high-level web scraping framework - Scrapy-Playwright - Browser automation integration for Scrapy - Playwright - Modern browser automation library - Apify SDK for Python - Actor framework and data storage - BeautifulSoup4 - HTML parsing (for complex extractions) - Regular Expressions - Advanced text extraction and cleaning ## Why Playwright? AmbitionBox employs sophisticated anti-bot detection that blocks standard HTTP requests, even when using proxies. Playwright integration provides: ✅ Real Browser Rendering - Executes JavaScript and renders pages like a real user ✅ Anti-Bot Bypass - Realistic browser fingerprinting and behavior ✅ Reliable Extraction - Ensures all dynamic content is loaded ✅ Scrapy Integration - Maintains all Scrapy benefits (pipelines, items, middlewares) ## Advantages Over HTTP-Only Scraping - Reliability - 100% success rate vs 0% with HTTP requests - JavaScript Support - Handles dynamic content loading - Anti-Detection - Bypasses sophisticated bot detection - Future-Proof - Works even as sites add more JavaScript ## Local Development ### Prerequisites - Python 3.9+ - Apify CLI ### Installation `bash # Install Apify CLI brew install apify-cli # macOS # or npm -g install apify-cli # Node.js # Pull the Actor apify pull # Install dependencies pip install -r requirements.txt` ### Running Locally `bash # Run with default input apify run # Or create/edit .actor/INPUT.json with your parameters` ### Example INPUT.json `json { "role": "python developer", "location": "bangalore", "maxPages": 3, "maxJobs": 50, "includeNaukriDetails": true, "proxyConfiguration": { "useApifyProxy": true } }` ## Performance Tips - Start Small - Test with `maxPages: 1` and `maxJobs: 10` first - Adjust Concurrency - Modify `CONCURRENT_REQUESTS` in spider settings for faster/slower scraping - Skip Naukri - Set `includeNaukriDetails: false` for basic info only (much faster) - Use Proxies - Enable Apify proxy to avoid rate limiting ## Scrapy Settings The spider uses these custom settings for optimal performance: `python custom_settings = { 'CONCURRENT_REQUESTS': 8, # Parallel requests 'DOWNLOAD_DELAY': 2, # Delay between requests (seconds) 'ROBOTSTXT_OBEY': True, # Respect robots.txt 'USER_AGENT': 'Mozilla/5.0...', # Custom user agent }` You can modify these in `src/spiders/ambitionbox.py` if needed. ## Troubleshooting ### No jobs found - The website structure may have changed - Check if the search URL is correct - Try reducing `DOWNLOAD_DELAY` if pages load slowly ### Incomplete data - Enable `includeNaukriDetails` for comprehensive extraction - Check if company pages are accessible - Review logs for specific errors ### Rate limiting - Increase `DOWNLOAD_DELAY` in settings - Reduce `CONCURRENT_REQUESTS` - Ensure proxy configuration is enabled ### Proxy timeouts - Switch to residential proxies in proxy configuration (highly recommended) - Residential proxies have much better success rates than datacenter proxies - Update input to include: `"apifyProxyGroups": ["RESIDENTIAL"]` - Note: Residential proxies consume more proxy credits but significantly improve reliability ## Architecture `src/ ├── spiders/ │ ├── init.py │ ├── title.py # Original title spider │ └── ambitionbox.py # AmbitionBox job scraper ├── items.py # Item definitions ├── pipelines.py # Data processing pipelines ├── middlewares.py # Request/response middlewares ├── settings.py # Scrapy settings ├── main.py # Actor entry point └── main.py # Execution wrapper` ## Resources - Scrapy Documentation - Apify Platform Documentation - Apify SDK for Python - Web Scraping with Scrapy ## License Apache 2.0.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Jobs Scrapper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: ai-scraper-labs
Pricing: Paid
Total Runs: 47
Active Users: 2

Related Actors

Company Employees Scraper

by build_matrix

🔥 LinkedIn Jobs Scraper

by bebity

Linkedin Company Detail (No Cookies)

by apimaestro

Linkedin Profile Details Batch Scraper + EMAIL (No Cookies)

by apimaestro

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support