Fast Pdf Processor
by contemporary_fruit
A simple API for PDF text extraction to JSON and page merging. Automate document processing without the overhead of bulky software.
Opens on Apify.com
About Fast Pdf Processor
Need to pull text from a PDF or combine pages without opening a clunky desktop app? This actor is your go-to. It's a straightforward API that handles the PDF tasks developers actually need. Just send it a PDF, and it gives you back clean, structured JSON with all the text neatly organized by page. It's perfect for when you're ingesting documents into a database, running content analysis, or just need to automate data extraction without the hassle. On the flip side, if you've got a massive report but only need pages 5, 12, and 20, you can tell it exactly which pages to merge into a new, streamlined PDF. I've used it to prepare client-ready documents from larger source files and to extract text for search indexing. It does these core jobs well, without the bloat of a full suite of features you'll never use. It's reliable, fast, and gets out of your way so you can focus on building your application.
What does this actor do?
Fast Pdf Processor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Fast Pdf Processor
An Apify actor for processing PDFs and converting web content to PDF. It provides four core operations through a single API.
Overview
This actor handles common PDF tasks: extracting text, merging specific pages, and converting HTML or web pages to PDF. It's deployed as an Apify actor, making it callable via API, CLI, or through automation platforms like n8n.
Key Features
- Extract Text: Pulls all text content from a PDF.
- Merge Pages: Creates a new PDF from a selection of pages (using zero-based page indices).
- HTML to PDF: Renders HTML strings to PDF using Playwright.
- URL to PDF: Converts a live webpage to PDF using Playwright.
How to Use
You can run the actor via the Apify API, REST API, or directly in the Apify Console.
Via Apify API (Python)
from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
actor = client.actor('YOUR_USERNAME/pdf-processor')
# Example: Extract text
run = actor.call(run_input={
"action": "extract-text",
"pdfUrl": "https://example.com/document.pdf"
})
# Get results
dataset = client.dataset(run['defaultDatasetId'])
results = list(dataset.iterate_items())
Via REST API
curl -X POST https://api.apify.com/v2/acts/YOUR_USERNAME~pdf-processor/runs \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-d '{
"action": "extract-text",
"pdfUrl": "https://example.com/document.pdf"
}'
In Apify Console
- Go to the actor's "Input" tab.
- Provide a JSON input (see examples below).
- Click "Run". Processed files and text are available in the "Dataset" tab.
Input / Output
Input Schema
The actor requires an action parameter and corresponding data.
Extract Text:
{
"action": "extract-text",
"pdfUrl": "https://example.com/document.pdf"
}
Merge Pages:
{
"action": "merge-pages",
"pdfUrl": "https://example.com/document.pdf",
"pageNumbers": [0, 2, 4] // Zero-based indices
}
HTML to PDF:
{
"action": "html-to-pdf",
"html": "<html><body><h1>Test</h1></body></html>"
}
URL to PDF:
{
"action": "url-to-pdf",
"pdfUrl": "https://example.com"
}
Output
Results are saved to the actor's default dataset. For PDF generation actions (merge-pages, html-to-pdf, url-to-pdf), the output is a PDF file. For extract-text, the output is a text file containing the extracted content.
Deployment
The actor can be deployed from a GitHub repository via the Apify Console or using the Apify CLI.
Using Apify CLI:
npm install -g apify-cli
apify login
apify init
apify push
Minimum Recommended Configuration:
* Memory: 512 MB (increase for large PDFs or complex webpages).
* Timeout: 300 seconds.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Fast Pdf Processor now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- contemporary_fruit
- Pricing
- Paid
- Total Runs
- 36
- Active Users
- 2
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support