Zip Download and Extraction Scraper
by alphabeta69
Automate the tedious work. This actor downloads zip files from URLs and extracts the contents directly into a usable dataset, saving you manual steps.
Opens on Apify.com
About Zip Download and Extraction Scraper
Ever needed to grab a .zip file from a webpage and just get to the data inside, without the manual download-unzip dance? That's exactly what this actor does. You give it a URL pointing to a zip file, and it handles the rest: fetching the file, extracting the contents, and saving everything into a clean dataset for you. It’s one of those simple automations that saves a surprising amount of time, especially when you're dealing with recurring tasks or pulling data from sources that regularly publish archives. I use it for things like processing daily log bundles, grabbing updated asset packs from a client's server, or prepping data dumps for analysis. It runs reliably in the cloud, so you can set it on a schedule and forget about it. The output is structured and ready to pipe into your next step, whether that's another automation, a database, or a simple CSV for review. It’s a straightforward solution for a common, tedious job.
What does this actor do?
Zip Download and Extraction Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Zip Download and Extraction Scraper
A Python utility for downloading and optionally extracting ZIP files from a list of URLs. It's designed for both local use and deployment as an Apify actor.
Overview
This tool reads a list of URLs from a JSON input, downloads the ZIP files in parallel batches, and saves them to an output folder. It includes retry logic for network errors and can automatically extract the contents of downloaded archives.
Key Features
- Flexible Input: Reads URLs from a local
input.jsonfile or theAPIFY_INPUTenvironment variable (for Apify platform runs). - Parallel Downloads: Uses configurable threading for faster batch downloads.
- Robust Operation: Implements retries on network failures and skips already-downloaded files by default.
- Optional Extraction: Can automatically extract the contents of downloaded ZIP files.
- Apify Integration: Ready to be deployed and run as an Apify actor.
How to Use
Local Execution
- Prepare Input: Create an
input.jsonfile with aurlsarray.
json { "urls": ["https://example.com/file1.zip", "https://example.com/file2.zip"], "extract": true, "batch": 4, "retries": 3 } - Install Dependencies:
bash pip install -r requirements.txt - Run the Script:
bash python src/download.py --input input.json
Common Command-Line Options:
* --output <folder>: Specify output directory (default: output).
* --batch N: Set number of parallel downloads (default: 4).
* --retries N: Set retry attempts per URL (default: 3).
* --overwrite: Overwrite existing files in the output folder.
* --extract: Extract ZIP contents after download.
Apify Platform
Deploying the Actor:
1. Install the Apify CLI: npm install -g @apify/cli
2. Log in: apify login
3. Push from the project root: apify push
Running the Actor:
Provide input via the Apify console or API using JSON like:
{
"urls": ["https://github.com/psf/requests/archive/refs/heads/main.zip"],
"extract": true
}
The actor receives this input via the APIFY_INPUT environment variable.
Docker Execution
You can also build and run the tool locally using Docker:
docker build -t zip-downloader .
docker run --rm -e APIFY_INPUT='{"urls":["https://example.com/file.zip"],"extract":true}' zip-downloader
Input / Output
Input Format:
The tool accepts a JSON object, either from a file or an environment variable. Key properties are:
* urls (required): Array of URLs to download.
* extract (optional, boolean): If true, extracts downloaded ZIPs.
* batch (optional, integer): Number of parallel downloads.
* retries (optional, integer): Retry attempts per URL.
Output:
* Downloaded .zip files are saved to the specified output folder (default output/).
* If extraction is enabled, contents are extracted into a folder named after the ZIP file (e.g., output/archive-name/).
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Zip Download and Extraction Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- alphabeta69
- Pricing
- Paid
- Total Runs
- 2
- Active Users
- 2
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support