Spawn Workers

Name: Spawn Workers
Author: pocesar

by pocesar

This actor lets you spawn tasks or other actors in parallel on the Apify platform that shares a common output dataset, splitting a RequestQueue-like d...

406 runs

10 users

Try This Actor

Opens on Apify.com

About Spawn Workers

This actor lets you spawn tasks or other actors in parallel on the Apify platform that shares a common output dataset, splitting a RequestQueue-like dataset containing request URLs

What does this actor do?

Spawn Workers is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Spawn workers This actor lets you spawn tasks or other actors in parallel on the Apify platform that shares a common output dataset, splitting a RequestQueue-like dataset containing request URLs ## Usage js const Apify = require("apify"); Apify.main(async () => { const input = await Apify.getInput(); const { limit, // every worker receives a "batch" offset, // that changes depending on how many were spawned inputDatasetId, outputDatasetId, parentRunId, isWorker, emptyDataset, // means the inputDatasetId is empty, and you should use another source, like the Key Value store ...rest // any other configuration you passed through workerInput } = input; // don't mix requestList with requestQueue // when in worker mode const requestList = new Apify.RequestList({ persistRequestsKey: 'START-URLS', sourcesFunction: async () => { if (!isWorker) { return [ { "url": "https://start-url..." } ] } const requestDataset = await Apify.openDataset(inputDatasetId); const { items } = await requestDataset.getData({ offset, limit, }); return items; } }); await requestList.initialize(); const requestQueue = isWorker ? undefined : await Apify.openRequestQueue(); const outputDataset = isWorker ? await Apify.openDataset(outputDatasetId) : undefined; const crawler = new Apify.CheerioCrawler({ requestList, requestQueue, handlePageFunction: async ({ $, request }) => { if (isWorker) { // scrape details here await outputDataset.pushData({ ...data }); } else { // instead of requestQueue.addRequest, you push the URLs to the dataset await Apify.pushData({ url: $("select stuff").attr("href"), userData: { label: $("select other stuff").data("rest"), }, }); } }, }); await crawler.run(); if (!isWorker) { const { output } = await Apify.call("pocesar/spawn-workers", { // if you omit this, the default dataset on the spawn-workers actor will hold all items outputDatasetId: "some-named-dataset", // use this actor default dataset as input for the workers requests, usually should be this own dataset ID inputUrlsDatasetId: Apify.getEnv().defaultDatasetId, // the name or ID of your worker actor (the one below) workerActorId: Apify.getEnv().actorId, // you can use a task instead workerTaskId: Apify.getEnv().actorTaskId, // Optionally pass input to the actors / tasks workerInput: { maxConcurrency: 20, mode: 1, some: "config", }, // Optional worker options workerOptions: { memoryMbytes: 256, }, // Number of workers workerCount: 2, // Parent run ID, so you can persist things related to this actor call in a centralized manner parentRunId: Apify.getEnv().actorRunId, }); } }); ## Motivation RequestQueue is the best way to process requests cross actors, but it doesn't offer a way to limit or get offsets from it, you can just iterate over its contents or add new requests. By using the dataset, you have the same functionality (sans the ability to deduplicate the URLs) that can be safely shared and partitioned to many actors at once. Each worker will be dealing with their own subset of URLs, with no overlapping. ## Limitations Don't use the following keys for `workerInput` as they will be overwritten: - offset: number - limit: number - inputDatasetId: string - outputDatasetId: string - workerId: number - parentRunId: string - isWorker: boolean - emptyDataset: boolean ## License Apache 2.0

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Spawn Workers now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: pocesar
Pricing: Paid
Total Runs: 406
Active Users: 10

Related Actors

Web Scraper

by apify

Cheerio Scraper

by apify

Website Content Crawler

by apify

Legacy PhantomJS Crawler

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

Spawn Workers

About Spawn Workers

What does this actor do?

Key Features

How to Use

Documentation

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?