Zip Download and Extraction Scraper

Zip Download and Extraction Scraper

by alphabeta69

Automate the tedious work. This actor downloads zip files from URLs and extracts the contents directly into a usable dataset, saving you manual steps.

2 runs
2 users
Try This Actor

Opens on Apify.com

About Zip Download and Extraction Scraper

Ever needed to grab a .zip file from a webpage and just get to the data inside, without the manual download-unzip dance? That's exactly what this actor does. You give it a URL pointing to a zip file, and it handles the rest: fetching the file, extracting the contents, and saving everything into a clean dataset for you. It’s one of those simple automations that saves a surprising amount of time, especially when you're dealing with recurring tasks or pulling data from sources that regularly publish archives. I use it for things like processing daily log bundles, grabbing updated asset packs from a client's server, or prepping data dumps for analysis. It runs reliably in the cloud, so you can set it on a schedule and forget about it. The output is structured and ready to pipe into your next step, whether that's another automation, a database, or a simple CSV for review. It’s a straightforward solution for a common, tedious job.

What does this actor do?

Zip Download and Extraction Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Zip Download and Extraction Scraper

A Python utility for downloading and optionally extracting ZIP files from a list of URLs. It's designed for both local use and deployment as an Apify actor.

Overview

This tool reads a list of URLs from a JSON input, downloads the ZIP files in parallel batches, and saves them to an output folder. It includes retry logic for network errors and can automatically extract the contents of downloaded archives.

Key Features

  • Flexible Input: Reads URLs from a local input.json file or the APIFY_INPUT environment variable (for Apify platform runs).
  • Parallel Downloads: Uses configurable threading for faster batch downloads.
  • Robust Operation: Implements retries on network failures and skips already-downloaded files by default.
  • Optional Extraction: Can automatically extract the contents of downloaded ZIP files.
  • Apify Integration: Ready to be deployed and run as an Apify actor.

How to Use

Local Execution

  1. Prepare Input: Create an input.json file with a urls array.
    json { "urls": ["https://example.com/file1.zip", "https://example.com/file2.zip"], "extract": true, "batch": 4, "retries": 3 }
  2. Install Dependencies:
    bash pip install -r requirements.txt
  3. Run the Script:
    bash python src/download.py --input input.json

Common Command-Line Options:
* --output <folder>: Specify output directory (default: output).
* --batch N: Set number of parallel downloads (default: 4).
* --retries N: Set retry attempts per URL (default: 3).
* --overwrite: Overwrite existing files in the output folder.
* --extract: Extract ZIP contents after download.

Apify Platform

Deploying the Actor:
1. Install the Apify CLI: npm install -g @apify/cli
2. Log in: apify login
3. Push from the project root: apify push

Running the Actor:
Provide input via the Apify console or API using JSON like:

{
    "urls": ["https://github.com/psf/requests/archive/refs/heads/main.zip"],
    "extract": true
}

The actor receives this input via the APIFY_INPUT environment variable.

Docker Execution

You can also build and run the tool locally using Docker:

docker build -t zip-downloader .
docker run --rm -e APIFY_INPUT='{"urls":["https://example.com/file.zip"],"extract":true}' zip-downloader

Input / Output

Input Format:
The tool accepts a JSON object, either from a file or an environment variable. Key properties are:
* urls (required): Array of URLs to download.
* extract (optional, boolean): If true, extracts downloaded ZIPs.
* batch (optional, integer): Number of parallel downloads.
* retries (optional, integer): Retry attempts per URL.

Output:
* Downloaded .zip files are saved to the specified output folder (default output/).
* If extraction is enabled, contents are extracted into a folder named after the ZIP file (e.g., output/archive-name/).

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Zip Download and Extraction Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
alphabeta69
Pricing
Paid
Total Runs
2
Active Users
2
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support