Fast Scraper

Name: Fast Scraper
Author: danielherman

by danielherman

Fast Scraper is a blazingly fast web scraper powered by Rust on the backend. It allows you to scrape static HTML pages extremely quickly while using o...

558 runs

5 users

Try This Actor

Opens on Apify.com

About Fast Scraper

Fast Scraper is a blazingly fast web scraper powered by Rust on the backend. It allows you to scrape static HTML pages extremely quickly while using only <128 MB of memory. With this scraper, you can maximize the efficiency of your credits on Apify.

What does this actor do?

Fast Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

What is Fast Scraper? Fast Scraper is a blazingly fast web scraper powered by Rust on the backend. It allows you to scrape static HTML pages extremely quickly while using only 128 MB of memory. With this scraper, you can maximize the efficiency of your credits on Apify. # Why Choose Fast Scraper Over Cheerio? Fast Scraper is blazing fast and will save you money. 🚀🚀🚀 Cheerio is powered by Node.js, meaning all the heavy lifting is done by JavaScript. JavaScript was never meant to be used as a scraper in the first place. It's similar to creating a rollercoaster game in an Excel sheet. 📉 📉 📉 ## How much will scraping with Fast Scraper will cost you? I did a benchmark where I scraped with max_concurency=50, 128 MB RAM and 1000 (52MB) csfd.cz pages the whole page and it cost me 0.026 USD and ran for 60 s. So it is very cheap. That would make roughly 38 500 (2GB) scraped websites for $1. ## How much cheaper and faster? Here is a comparison performed on 1,000 csfd.cz pages. The entire static HTML was scraped and stored in storage. With Cheerio, using 128 MB of RAM, the process timed out after 3,600 seconds because the scraper actor required more RAM. On the other hand, Fast Scraper only needed an average of 33.2 MB of RAM and 0.88% CPU usage. It's extremely light and fast. At this moment, the bottleneck is probably Docker itself. # Input parameters ## Example of an input { "requests": [ { "url": "https://www.scrapethissite.com/pages/simple/" }, { "id": "forms", "url": "https://www.scrapethissite.com/pages/simple/", "extract": [ { "field_name": "extracted_html", "selector": "#countries > div > div:nth-child(4) > div:nth-child(1)", "extract_type": "HTML" } ] }, { "id": "hockey", "url": "https://www.scrapethissite.com/pages/forms/", "extract": [ { "field_name": "year1", "selector": "#hockey > div > table > tbody > tr:nth-child(2) > td.year", "extract_type": "Text" }, { "field_name": "year2", "selector": "#hockey > div > table > tbody > tr:nth-child(3) > td.year", "extract_type": "Text" }, { "field_name": "class_name", "selector": "#hockey > div > table > tbody > tr:nth-child(2) > td.year", "extract_type": { "Attribute": "class" } } ] } ], "user_agent": "ApifyFastScraper/1.0", "force_cloud": false, "push_data_size": 500, "max_concurrency": 10, "max_request_retries": 3, "max_request_retry_timeout_ms": 10000, "request_retry_wait_ms": 5000 } ## Breaking Down the Configuration 1) Requests: You'll list multiple web pages to be scraped. Each web page entry (request) will need the following: * url: The URL of the web page to scrape. * id (optional): Unique identifier for the request. * extract (optional): List of fields to extract from the page. Each field will have a: * field_name: A name to identify the extracted data. * selector: CSS selector to pinpoint the HTML element containing the data. * extract_type: Type of extraction (Text, HTML, or an attribute like class). 2) Headers (optional): You can set additional HTTP headers globally or for individual requests. Global headers will be replaced with the request headers. * Example Global Header: { "Accept": "application/json" } * Example Request-specific Header: { "Accept-Language": "en-US" } 3) User-Agent (optional): Specify the user-agent string your scraper will use globally or for individual requests. Global user agent will be replaced with the request user agent. This helps mimic different web browsers. 4) Advanced Options (optional): * force_cloud: Whether to force the scraper to run in a cloud environment. * push_data_size: Max size of data chunks to push. Smaller value will ensure that you can offload the data into storage in smaller chunks. * max_concurrency: Max number of concurrent requests. * max_request_retries: Number of retries if a request fails. * max_request_retry_timeout_ms: Max time to wait before retrying a request. * request_retry_wait_ms: Waiting time between retries. # Example of Output [ { "id": "hockey", "url": "https://www.scrapethissite.com/pages/forms/", "data": { "year2": "\n 1990\n ", "class_name": "year", "year1": "\n 1990\n " } }, { "id": "9a2c62e1-79b0-4081-8db8-7d8cf549d4af", "url": "https://www.scrapethissite.com/pages/simple/", "data": { "full_html": "<!doctype html>\n<html lang=\"en\">the rest of html</html>" } }, { "id": "forms", "url": "https://www.scrapethissite.com/pages/simple/", "data": { "extracted_html": "\n <h3 class=\"country-name\">\n <i class=\"flag-icon flag-icon-ad\"></i>\n Andorra\n </h3>\n <div class=\"country-info\">\n <strong>Capital:</strong> <span class=\"country-capital\">Andorra la Vella</span><br>\n <strong>Population:</strong> <span class=\"country-population\">84000</span><br>\n <strong>Area (km<sup>2</sup>):</strong> <span class=\"country-area\">468.0</span><br>\n </div>\n " } } ] # Your feedback I am always working on improving the performance of my Actors. So if you’ve got any technical feedback for Fast Scraper or simply found a bug, please create an issue on the Actor’s Issues tab in Apify Console.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Fast Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: danielherman
Pricing: Paid
Total Runs: 558
Active Users: 5

Related Actors

Web Scraper

by apify

Cheerio Scraper

by apify

Website Content Crawler

by apify

Legacy PhantomJS Crawler

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

Fast Scraper

About Fast Scraper

What does this actor do?

Key Features

How to Use

Documentation

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?