Jobs Scrapper

by ai-scraper-labs

Powerful AmbitionBox Job Scraper that extracts detailed job listings by role and location. Includes responsibilities, skills, qualifications, company ...

47 runs
2 users
Try This Actor

Opens on Apify.com

About Jobs Scrapper

Powerful AmbitionBox Job Scraper that extracts detailed job listings by role and location. Includes responsibilities, skills, qualifications, company insights, and Naukri integration for technical details. Fast, structured, and proxy-supported for large-scale data collection.

What does this actor do?

Jobs Scrapper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

AmbitionBox Job Scraper > ✨ Converted to Node.js + Playwright + Apify Actor > This project has been migrated from Python/Scrapy to Node.js/Playwright while preserving 100% of the original scraping logic. > See README_CONVERSION.md for conversion details. An Apify Actor that scrapes job listings from AmbitionBox using Playwright browser automation, with optional detailed information extraction from Naukri job pages. ## Features - AmbitionBox Job Scraping - Extracts comprehensive job listings including: - Job title, company, location, salary, and experience requirements - Detailed job descriptions and responsibilities - Required skills and qualifications - Employment type and application links - Naukri Detail Extraction - Optionally fetches detailed job information from linked Naukri pages: - Key responsibilities (structured list) - Required skills and technologies - Educational qualifications and experience requirements - Detailed job descriptions - Company Information - Extracts comprehensive company details: - Company overview and summary - Founding year and employee count - Company website and headquarters - Work policies (WFH, hybrid, etc.) - Complete benefits and perks list - Apify Platform Native - Built as a first-class Apify Actor: - Automatic request scheduling with AutoscaledPool - Built-in retry logic and error handling - Cloud-persisted request queue - Integrated dataset storage - Playwright Integration - Uses browser rendering to bypass anti-bot detection: - Handles JavaScript-rendered content - Bypasses AmbitionBox's anti-scraping measures - Chromium headless browser - Ensures reliable data extraction - Concurrency Support - Parallel processing for faster scraping: - Configurable concurrency (1-10 workers) - Request queue management - Automatic rate limiting ## Input Parameters Configure the scraper through the Actor input: - role (required, string) - Job role or title to search for - Example: "software engineer", "python developer", "data scientist" - location (optional, string) - Location to search jobs in - Example: "bangalore", "mumbai", "delhi" - Use "all" or "worldwide" for all locations - Note: AmbitionBox primarily lists jobs in India - If a specific location returns no results, automatically falls back to all locations - maxPages (optional, integer, default: 2) - Maximum number of listing pages to scrape - Range: 1-50 - maxJobs (optional, integer, default: 20) - Maximum number of jobs to scrape - Set to 0 for unlimited - includeNaukriDetails (optional, boolean, default: true) - Whether to fetch detailed information from Naukri - true: Comprehensive data (slower) - false: Basic data only (3-4x faster) - proxyConfiguration (optional, object) - Proxy settings for the scraper - Default: Uses Apify proxy with RESIDENTIAL group (recommended) - Recommended: Use residential proxies for best success rates with AmbitionBox - Datacenter proxies may experience higher timeout rates ## Output Format The Actor stores data in the default Apify dataset with this structure: json { "title": "Senior Software Engineer", "company": "Tech Company Pvt Ltd", "location": "Bangalore, Karnataka", "exp_level": "3-6 years", "salary_range": "₹10-18 LPA", "url": "https://www.ambitionbox.com/jobs/...", "apply_url": "https://www.naukri.com/job-listings-...", "about_this_role": "Full job description text...", "key_responsibility": [ "Design and develop scalable backend systems.", "Collaborate with cross-functional teams to define features.", "Ensure code quality through reviews and testing." ], "required_skills": [ "Python", "Django", "AWS", "SQL", "Docker" ], "required_qualifications": [ "Bachelor's degree in Computer Science or related field.", "3+ years of experience in backend development." ], "benefits_perks": [ "Health Insurance", "Work From Home", "Flexible Hours", "Learning & Development", "Paid Time Off" ], "company_info": { "name": "Tech Company Pvt Ltd", "Founded in": "2015", "Global Employee Count": "500-1000", "Website": "https://techcompany.com", "company_summary": "Leading technology company specializing in...", "work_policy": "Hybrid: 3 days WFO, Remote: 2 days WFH" }, "job_type": "Full-time" } ## How It Works 1. URL Construction - Builds AmbitionBox search URL from role and location parameters 2. Listing Extraction - Scrapes job listing pages using Scrapy's efficient crawling 3. Detail Parsing - For each job, extracts comprehensive information from detail pages 4. Naukri Integration - If enabled, follows "Apply on Naukri" links for additional details 5. Company Data - Fetches company overview and benefits from dedicated pages 6. Data Storage - Stores all structured data in Apify dataset ## Technologies Used - Scrapy - Fast, high-level web scraping framework - Scrapy-Playwright - Browser automation integration for Scrapy - Playwright - Modern browser automation library - Apify SDK for Python - Actor framework and data storage - BeautifulSoup4 - HTML parsing (for complex extractions) - Regular Expressions - Advanced text extraction and cleaning ## Why Playwright? AmbitionBox employs sophisticated anti-bot detection that blocks standard HTTP requests, even when using proxies. Playwright integration provides: ✅ Real Browser Rendering - Executes JavaScript and renders pages like a real user ✅ Anti-Bot Bypass - Realistic browser fingerprinting and behavior ✅ Reliable Extraction - Ensures all dynamic content is loaded ✅ Scrapy Integration - Maintains all Scrapy benefits (pipelines, items, middlewares) ## Advantages Over HTTP-Only Scraping - Reliability - 100% success rate vs 0% with HTTP requests - JavaScript Support - Handles dynamic content loading - Anti-Detection - Bypasses sophisticated bot detection - Future-Proof - Works even as sites add more JavaScript ## Local Development ### Prerequisites - Python 3.9+ - Apify CLI ### Installation bash # Install Apify CLI brew install apify-cli # macOS # or npm -g install apify-cli # Node.js # Pull the Actor apify pull # Install dependencies pip install -r requirements.txt ### Running Locally bash # Run with default input apify run # Or create/edit .actor/INPUT.json with your parameters ### Example INPUT.json json { "role": "python developer", "location": "bangalore", "maxPages": 3, "maxJobs": 50, "includeNaukriDetails": true, "proxyConfiguration": { "useApifyProxy": true } } ## Performance Tips - Start Small - Test with maxPages: 1 and maxJobs: 10 first - Adjust Concurrency - Modify CONCURRENT_REQUESTS in spider settings for faster/slower scraping - Skip Naukri - Set includeNaukriDetails: false for basic info only (much faster) - Use Proxies - Enable Apify proxy to avoid rate limiting ## Scrapy Settings The spider uses these custom settings for optimal performance: python custom_settings = { 'CONCURRENT_REQUESTS': 8, # Parallel requests 'DOWNLOAD_DELAY': 2, # Delay between requests (seconds) 'ROBOTSTXT_OBEY': True, # Respect robots.txt 'USER_AGENT': 'Mozilla/5.0...', # Custom user agent } You can modify these in src/spiders/ambitionbox.py if needed. ## Troubleshooting ### No jobs found - The website structure may have changed - Check if the search URL is correct - Try reducing DOWNLOAD_DELAY if pages load slowly ### Incomplete data - Enable includeNaukriDetails for comprehensive extraction - Check if company pages are accessible - Review logs for specific errors ### Rate limiting - Increase DOWNLOAD_DELAY in settings - Reduce CONCURRENT_REQUESTS - Ensure proxy configuration is enabled ### Proxy timeouts - Switch to residential proxies in proxy configuration (highly recommended) - Residential proxies have much better success rates than datacenter proxies - Update input to include: "apifyProxyGroups": ["RESIDENTIAL"] - Note: Residential proxies consume more proxy credits but significantly improve reliability ## Architecture src/ ├── spiders/ │ ├── __init__.py │ ├── title.py # Original title spider │ └── ambitionbox.py # AmbitionBox job scraper ├── items.py # Item definitions ├── pipelines.py # Data processing pipelines ├── middlewares.py # Request/response middlewares ├── settings.py # Scrapy settings ├── main.py # Actor entry point └── __main__.py # Execution wrapper ## Resources - Scrapy Documentation - Apify Platform Documentation - Apify SDK for Python - Web Scraping with Scrapy ## License Apache 2.0.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Jobs Scrapper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
ai-scraper-labs
Pricing
Paid
Total Runs
47
Active Users
2
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support