Free eBook Scraper

Name: Free eBook Scraper
Author: epctex

by epctex

Explore and Download Free eBooks - Find and download a wide selection of free eBooks from Project Gutenberg. Search by keywords and language preferenc...

3,990 runs

261 users

Try This Actor

Opens on Apify.com

About Free eBook Scraper

Explore and Download Free eBooks - Find and download a wide selection of free eBooks from Project Gutenberg. Search by keywords and language preferences. Discover literary gems in multiple formats.

What does this actor do?

Free eBook Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Actor - Gutenberg.org Scraper Gutenberg.org Scraper is an Apify actor for extracting data of ebooks from Gutenberg.org. It allows you to search for keywords and pick a language. It is build on top of Apify SDK and you can run it both on Apify platform and locally. - Gutenberg.org Scraper Input Parameters - Gutenberg.org Scraper Input Example - Gutenberg.org Scraper Ebook Output - Extend output function - Compute Unit Consumption - During The Run - Gutenberg.org Export ## Gutenberg.org Scraper Input Parameters The input of this scraper should be JSON containing the list of pages on Gutenberg that should be visited. Possible fields are: | Field | Type | Description | | ----- | ---- | ----------- | | search | String | (optional) | The keyword that you want to search on Gutenberg | | language | Array | (optional) List of languages that Gutenberg provides. You can fetch all ebooks of a language with it | | startUrls | Array | (optional) List of Gutenberg URLs. You should provide ony "search" or "browse" URLs | | maxItems | Integer | (optional) Maximum number of items that output will contain | | extendOutputFunction | string | Function that takes a JQuery handle ($) as argument and returns data that will be merged with the default output. More information in Extend output function | | proxyConfig | Object | Proxy configuration | This solution requires the use of Proxy servers, either your own proxy servers or you can use Apify Proxy. ### Gutenberg Scraper Input example `json { "proxyConfig":{"useApifyProxy": true}, "startUrls": [ { "url": "https://www.gutenberg.org/browse/recent/last7" }, { "url": "https://www.gutenberg.org/browse/titles/h" } ] }` ## Gutenberg Ebook Output The structure of each item in Gutenberg ebooks looks like this: json { "author": "United States. National Park Service", "title": "Cumberland Island: Junior Ranger Program Activity Guide for Ages 5-7", "language": "English", "htmlURL": "https://www.gutenberg.org/files/61452/61452-h/61452-h.htm", "epubURL": "https://www.gutenberg.org/ebooks/61452.epub.images?session_id=24e44a13d40847bb8d8b13a9216689880a3221cf", "kindleURL": "https://www.gutenberg.org/ebooks/61452.kindle.images?session_id=24e44a13d40847bb8d8b13a9216689880a3221cf", "plainTextURL": "https://www.gutenberg.org/files/61452/61452-0.txt" } ### Extend output function You can use this function to update the default output of this actor. This function gets a JQuery handle `$` as an argument so you can choose what data from the page you want to scrape. The output from this will function will get merged with the default output. The return value of this function has to be an object! You can return fields to achive 3 different things: - Add a new field - Return object with a field that is not in the default output - Change a field - Return an existing field with a new value - Remove a field - Return an existing field with a value `undefined` ### Compute Unit Consumption The actor optimized to run blazing fast and scrape many product as possible. Therefore, it forefronts all product detail requests. If actor doesn't block very often it'll scrape ~250 products in 3 minutes with 0.0235 compute units. ## During the Run During the run, the actor will output messages letting you know what is going on. Each message always contains a short label specifying which page from the provided list is currently specified. When items are loaded from the page, you should see a message about this event with a loaded item count and total item count for each page. If you provide incorrect input to the actor, it will immediately stop with failure state and output an explanation of what is wrong. ## Gutenberg Export During the run, the actor stores results into a dataset. Each item is a separate item in the dataset. You can manage the results in any languague (Python, PHP, Node JS/NPM). See the FAQ or our API reference to learn more about getting results from this Gutenberg actor. ## Contact Please visit us through epctex.com to see all the products that is available for you. If you are looking for any custom integration or so, please reach out to us through the chat box in epctex.com. In need of support? devops@epctex.com is at your service.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Free eBook Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: epctex
Pricing: Paid
Total Runs: 3,990
Active Users: 261

Related Actors

Company Employees Scraper

by build_matrix

🔥 LinkedIn Jobs Scraper

by bebity

Linkedin Company Detail (No Cookies)

by apimaestro

Linkedin Profile Details Batch Scraper + EMAIL (No Cookies)

by apimaestro

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support