Pdf Power Tools

by agenscrape

Split, merge, compress, convert & OCR PDFs via API. Extract text from scanned documents in 14 languages. Compress files for email, convert pages to PN...

36 runs
5 users
Try This Actor

Opens on Apify.com

About Pdf Power Tools

Split, merge, compress, convert & OCR PDFs via API. Extract text from scanned documents in 14 languages. Compress files for email, convert pages to PNG/JPEG/WebP, split by pages or ranges, merge multiple PDFs. Perfect for document automation & data extraction workflows.

What does this actor do?

Pdf Power Tools is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

PDF Power Tools Facing an issue, unexpected error, edge case, or have a feature suggestion? Post it here and we'll address it within 24 hours. ## What is PDF Power Tools? PDF Power Tools is a comprehensive PDF processing API that handles all your PDF manipulation needs in the cloud. Whether you need to split large documents, merge multiple PDFs, compress files for email, extract text from scanned documents using OCR, or convert PDF pages to images - this actor does it all. Perfect for: - Document automation workflows - Process PDFs at scale without local software - Data extraction pipelines - Extract text from scanned invoices, receipts, contracts - Content management systems - Generate thumbnails, compress uploads, split documents - Archival and digitization - OCR historical documents, enhance scanned pages - Web applications - Server-side PDF processing via API ## Features ### Split PDF Break down large PDF documents into smaller, manageable files. Split options include: - Each page separate - Create individual PDFs for every page - By page ranges - Split into custom ranges (e.g., pages 1-10, 11-20, 21-30) - Split in half - Divide document into two equal parts - Extract specific pages - Pull out only the pages you need - By file size - Automatically split when file exceeds size limit ### Merge PDF Combine multiple PDF files into a single document: - Merge unlimited PDFs in sequence - Custom merge order - Interleave pages from multiple documents - Insert pages from one PDF into another at specific positions ### Compress PDF Reduce PDF file size for email attachments, web uploads, or storage optimization: - Low compression - Minimal size reduction, highest quality - Medium compression - Balanced quality and file size (default) - High compression - Maximum size reduction - Screen preset - Optimized for on-screen viewing - Print preset - Optimized for printing quality ### Convert PDF to Images Transform PDF pages into high-quality images: - Output formats: PNG, JPEG, WebP, TIFF - Customizable DPI (72-600) - Convert all pages or specific page selection - Combine all pages into single tall image - Generate thumbnails ### OCR - Text Extraction from Scanned PDFs Extract text from scanned documents, images, and non-searchable PDFs using Tesseract OCR: - 14 supported languages: English, French, German, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Chinese (Simplified & Traditional), Japanese, Korean, Arabic - Image preprocessing for improved accuracy - Confidence scores per page - Word and line count statistics ### Enhance Scanned PDFs Improve readability of scanned documents: - Sharpen blurry text and images - Reduce noise and artifacts - Adjust contrast and brightness - Configurable DPI settings ### Page Manipulation Fine-grained control over PDF pages: - Reorder pages within a document - Remove unwanted pages - Insert pages at specific positions ### PDF Information Analyze PDF files before processing: - Page count and dimensions - File size breakdown - Detect if PDF is scanned or native text - Compression estimate ## Input Options ### Basic Input json { "operation": "split", "pdfUrl": "https://example.com/document.pdf" } ### Using Base64 Input json { "operation": "compress", "pdfBase64": "JVBERi0xLjcKCjEgMCBvYmoK..." } ## Operation Examples ### Get PDF Information json { "operation": "info", "pdfUrl": "https://example.com/document.pdf" } ### Split Into Individual Pages json { "operation": "split", "pdfUrl": "https://example.com/large-document.pdf", "splitMode": "each_page" } ### Split By Page Ranges json { "operation": "split", "pdfUrl": "https://example.com/document.pdf", "splitMode": "ranges", "ranges": ["1-10", "11-20", "21-30"] } ### Extract Specific Pages json { "operation": "split", "pdfUrl": "https://example.com/document.pdf", "splitMode": "extract", "pages": [1, 5, 10, 15] } ### Merge Multiple PDFs json { "operation": "merge", "pdfUrls": [ "https://example.com/part1.pdf", "https://example.com/part2.pdf", "https://example.com/part3.pdf" ] } ### Merge With Custom Order json { "operation": "merge", "pdfUrls": ["doc1.pdf", "doc2.pdf", "doc3.pdf"], "order": [2, 0, 1] } ### Compress PDF json { "operation": "compress", "pdfUrl": "https://example.com/large-file.pdf", "compressionPreset": "high" } ### Convert PDF to PNG Images json { "operation": "convert", "pdfUrl": "https://example.com/document.pdf", "outputFormat": "png", "dpi": 200, "quality": 95 } ### Convert Specific Pages to JPEG json { "operation": "convert", "pdfUrl": "https://example.com/document.pdf", "outputFormat": "jpeg", "pages": [1, 3, 5], "dpi": 150 } ### OCR - Extract Text from Scanned PDF json { "operation": "ocr", "pdfUrl": "https://example.com/scanned-document.pdf", "language": "eng", "preprocess": true } ### OCR in French json { "operation": "ocr", "pdfUrl": "https://example.com/french-scan.pdf", "language": "fra" } ### Enhance Scanned Document json { "operation": "enhance", "pdfUrl": "https://example.com/old-scan.pdf", "sharpen": true, "denoise": true, "contrast": 1.3, "brightness": 1.1 } ### Generate Thumbnail json { "operation": "thumbnail", "pdfUrl": "https://example.com/document.pdf", "thumbnailWidth": 300, "outputFormat": "png" } ### Remove Pages json { "operation": "merge", "pdfUrl": "https://example.com/document.pdf", "pagesToRemove": [2, 5, 8] } ### Reorder Pages json { "operation": "merge", "pdfUrl": "https://example.com/document.pdf", "newPageOrder": [4, 3, 2, 1, 5, 6] } ## Output Results are saved to the run's Key-Value Store for easy download: | Operation | Output Files | |-----------|-------------| | Split | page_001.pdf, page_002.pdf, ... or pages_1-10.pdf, etc. | | Merge | merged.pdf | | Compress | compressed.pdf | | Convert | page_001.png, page_002.png, ... | | OCR | extracted_text.txt + Dataset with per-page results | | Enhance | enhanced.pdf | | Thumbnail | thumbnail.png | ### Sample Output json { "operation": "compress", "preset": "high", "pageCount": 25, "originalSize": "4.5 MB", "compressedSize": "1.2 MB", "compressionRatio": "73.3%", "outputKey": "compressed.pdf" } ## Supported Languages for OCR | Code | Language | |------|----------| | eng | English | | fra | French | | deu | German | | spa | Spanish | | ita | Italian | | por | Portuguese | | nld | Dutch | | pol | Polish | | rus | Russian | | chi_sim | Chinese (Simplified) | | chi_tra | Chinese (Traditional) | | jpn | Japanese | | kor | Korean | | ara | Arabic | ## Compression Presets | Preset | Image Quality | Best For | |--------|--------------|----------| | low | 90% | Archives, legal documents | | medium | 75% | General use, email | | high | 50% | Web uploads, storage saving | | screen | 60% | On-screen viewing | | print | 85% | Print-quality output | ## Pricing | Event | Price | Description | |-------|-------|-------------| | pdf-loaded | $0.005 | Each PDF loaded from URL or base64 | | page-enhanced | $0.01 | Each page enhanced (sharpen, denoise) | | page-processed | $0.002 | Each page processed (split, merge, compress) | | ocr-page | $0.02 | Each page with OCR text extraction | | pdf-compressed | $0.01 | PDF compression completed | | page-converted | $0.005 | Each page converted to image | | pdf-merged | $0.01 | PDF merge operation completed | | metadata-extracted | $0.005 | PDF info/metadata extraction | | text-extracted | $0.005 | Text extraction completed | ## Use Cases - Invoice Processing - Extract data from scanned invoices using OCR - Document Splitting - Break down large reports into chapters - PDF Compression - Reduce file size for email attachments - Image Generation - Create thumbnails for document previews - Document Merging - Combine multiple contracts into one file - Archival - Enhance and OCR historical scanned documents - Web Publishing - Convert PDF pages to web-friendly images - Data Extraction - Pull text from non-searchable PDFs

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Pdf Power Tools now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
agenscrape
Pricing
Paid
Total Runs
36
Active Users
5
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support