Fast Pdf Processor

Fast Pdf Processor

by contemporary_fruit

A simple API for PDF text extraction to JSON and page merging. Automate document processing without the overhead of bulky software.

36 runs
2 users
Try This Actor

Opens on Apify.com

About Fast Pdf Processor

Need to pull text from a PDF or combine pages without opening a clunky desktop app? This actor is your go-to. It's a straightforward API that handles the PDF tasks developers actually need. Just send it a PDF, and it gives you back clean, structured JSON with all the text neatly organized by page. It's perfect for when you're ingesting documents into a database, running content analysis, or just need to automate data extraction without the hassle. On the flip side, if you've got a massive report but only need pages 5, 12, and 20, you can tell it exactly which pages to merge into a new, streamlined PDF. I've used it to prepare client-ready documents from larger source files and to extract text for search indexing. It does these core jobs well, without the bloat of a full suite of features you'll never use. It's reliable, fast, and gets out of your way so you can focus on building your application.

What does this actor do?

Fast Pdf Processor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Fast Pdf Processor

An Apify actor for processing PDFs and converting web content to PDF. It provides four core operations through a single API.

Overview

This actor handles common PDF tasks: extracting text, merging specific pages, and converting HTML or web pages to PDF. It's deployed as an Apify actor, making it callable via API, CLI, or through automation platforms like n8n.

Key Features

  • Extract Text: Pulls all text content from a PDF.
  • Merge Pages: Creates a new PDF from a selection of pages (using zero-based page indices).
  • HTML to PDF: Renders HTML strings to PDF using Playwright.
  • URL to PDF: Converts a live webpage to PDF using Playwright.

How to Use

You can run the actor via the Apify API, REST API, or directly in the Apify Console.

Via Apify API (Python)

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
actor = client.actor('YOUR_USERNAME/pdf-processor')

# Example: Extract text
run = actor.call(run_input={
    "action": "extract-text",
    "pdfUrl": "https://example.com/document.pdf"
})

# Get results
dataset = client.dataset(run['defaultDatasetId'])
results = list(dataset.iterate_items())

Via REST API

curl -X POST https://api.apify.com/v2/acts/YOUR_USERNAME~pdf-processor/runs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -d '{
    "action": "extract-text",
    "pdfUrl": "https://example.com/document.pdf"
  }'

In Apify Console

  1. Go to the actor's "Input" tab.
  2. Provide a JSON input (see examples below).
  3. Click "Run". Processed files and text are available in the "Dataset" tab.

Input / Output

Input Schema

The actor requires an action parameter and corresponding data.

Extract Text:

{
  "action": "extract-text",
  "pdfUrl": "https://example.com/document.pdf"
}

Merge Pages:

{
  "action": "merge-pages",
  "pdfUrl": "https://example.com/document.pdf",
  "pageNumbers": [0, 2, 4] // Zero-based indices
}

HTML to PDF:

{
  "action": "html-to-pdf",
  "html": "<html><body><h1>Test</h1></body></html>"
}

URL to PDF:

{
  "action": "url-to-pdf",
  "pdfUrl": "https://example.com"
}

Output

Results are saved to the actor's default dataset. For PDF generation actions (merge-pages, html-to-pdf, url-to-pdf), the output is a PDF file. For extract-text, the output is a text file containing the extracted content.

Deployment

The actor can be deployed from a GitHub repository via the Apify Console or using the Apify CLI.

Using Apify CLI:

npm install -g apify-cli
apify login
apify init
apify push

Minimum Recommended Configuration:
* Memory: 512 MB (increase for large PDFs or complex webpages).
* Timeout: 300 seconds.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Fast Pdf Processor now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
contemporary_fruit
Pricing
Paid
Total Runs
36
Active Users
2
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support