Project Gutenberg Research Scraper

Project Gutenberg Research Scraper

by happyfhantum

Scrape every result from Project Gutenberg, not just the first page. Perfect for academic research, finding complete author bibliographies, or building book datasets.

1,052 runs
8 users
Try This Actor

Opens on Apify.com

About Project Gutenberg Research Scraper

If you've ever tried to search Project Gutenberg for research, you know the frustration. The site's built-in search only shows you a tiny slice of its massive 70,000+ ebook library. You end up manually clicking through pages, wondering what you're missing. I built this scraper to solve that exact problem. It systematically works through every single page of Gutenberg's search results using multi-page pagination, so you get every single book that matches your query—not just the first page that the website decides to show you. It’s saved me countless hours. Whether you're an academic compiling a bibliography, a developer building a dataset, or just a reader trying to find every novel by a specific author, this tool pulls the complete list. You can filter by author, title, or subject to drill down into specialized topics. The output is clean and structured, ready for analysis or import into your project. Forget the manual grind; this automates the tedious part of the research process, letting you focus on the actual work.

What does this actor do?

Project Gutenberg Research Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Project Gutenberg Research Scraper

A Node.js template for scraping structured data from a single web page. You provide a URL via input, and the actor fetches the HTML, parses it, and stores the results in an Apify dataset. It's built with Axios for HTTP requests and Cheerio for HTML parsing.

Key Features

  • Apify SDK: Provides the foundation for building and running the actor.
  • Input Schema: Validates the input configuration, primarily the target URL.
  • Dataset Storage: Outputs structured data (like page headings) into a queryable Apify dataset.
  • Axios & Cheerio: Uses Axios for reliable page fetching and Cheerio for jQuery-style HTML parsing.
  • Customizable Code: The scraping logic is simple to edit for targeting different page elements.

Input/Output

Input: The actor expects an input object containing the url of the page to scrape.

{
  "url": "https://example.com"
}

Output: Scraped data is saved to the actor's default dataset. The default template extracts page headings, producing items like:

{
  "heading": "Example Domain",
  "level": "h1"
}

You can modify the code to extract any other data from the page.

How to Use

Basic Usage (Apify Console)

  1. Build the Actor using this template in the Apify Console.
  2. Configure the Input by providing the target URL.
  3. Run the Actor. The results will be available in the dataset tab.

How It Works

The actor's flow is straightforward:
1. Fetches the input configuration using Actor.getInput().
2. Downloads the HTML from the provided URL using axios.get(url).
3. Loads the HTML into Cheerio with cheerio.load(response.data) for parsing.
4. Executes the parsing logic (e.g., $("h1, h2, h3").each(...)) to extract data.
5. Saves each extracted item using Actor.pushData().

Local Development

To develop locally, pull the actor using the Apify CLI:
1. Install the CLI:
bash npm -g install apify-cli
2. Pull the actor by its unique name or ID:
bash apify pull <ActorId>
You can find the Actor ID or unique name in the Apify Console.

Resources

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Project Gutenberg Research Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
happyfhantum
Pricing
Paid
Total Runs
1,052
Active Users
8
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support