Pdf Power Tools
by agenscrape
Split, merge, compress, convert & OCR PDFs via API. Extract text from scanned documents in 14 languages. Compress files for email, convert pages to PN...
Opens on Apify.com
About Pdf Power Tools
Split, merge, compress, convert & OCR PDFs via API. Extract text from scanned documents in 14 languages. Compress files for email, convert pages to PNG/JPEG/WebP, split by pages or ranges, merge multiple PDFs. Perfect for document automation & data extraction workflows.
What does this actor do?
Pdf Power Tools is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
PDF Power Tools Facing an issue, unexpected error, edge case, or have a feature suggestion? Post it here and we'll address it within 24 hours. ## What is PDF Power Tools? PDF Power Tools is a comprehensive PDF processing API that handles all your PDF manipulation needs in the cloud. Whether you need to split large documents, merge multiple PDFs, compress files for email, extract text from scanned documents using OCR, or convert PDF pages to images - this actor does it all. Perfect for: - Document automation workflows - Process PDFs at scale without local software - Data extraction pipelines - Extract text from scanned invoices, receipts, contracts - Content management systems - Generate thumbnails, compress uploads, split documents - Archival and digitization - OCR historical documents, enhance scanned pages - Web applications - Server-side PDF processing via API ## Features ### Split PDF Break down large PDF documents into smaller, manageable files. Split options include: - Each page separate - Create individual PDFs for every page - By page ranges - Split into custom ranges (e.g., pages 1-10, 11-20, 21-30) - Split in half - Divide document into two equal parts - Extract specific pages - Pull out only the pages you need - By file size - Automatically split when file exceeds size limit ### Merge PDF Combine multiple PDF files into a single document: - Merge unlimited PDFs in sequence - Custom merge order - Interleave pages from multiple documents - Insert pages from one PDF into another at specific positions ### Compress PDF Reduce PDF file size for email attachments, web uploads, or storage optimization: - Low compression - Minimal size reduction, highest quality - Medium compression - Balanced quality and file size (default) - High compression - Maximum size reduction - Screen preset - Optimized for on-screen viewing - Print preset - Optimized for printing quality ### Convert PDF to Images Transform PDF pages into high-quality images: - Output formats: PNG, JPEG, WebP, TIFF - Customizable DPI (72-600) - Convert all pages or specific page selection - Combine all pages into single tall image - Generate thumbnails ### OCR - Text Extraction from Scanned PDFs Extract text from scanned documents, images, and non-searchable PDFs using Tesseract OCR: - 14 supported languages: English, French, German, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Chinese (Simplified & Traditional), Japanese, Korean, Arabic - Image preprocessing for improved accuracy - Confidence scores per page - Word and line count statistics ### Enhance Scanned PDFs Improve readability of scanned documents: - Sharpen blurry text and images - Reduce noise and artifacts - Adjust contrast and brightness - Configurable DPI settings ### Page Manipulation Fine-grained control over PDF pages: - Reorder pages within a document - Remove unwanted pages - Insert pages at specific positions ### PDF Information Analyze PDF files before processing: - Page count and dimensions - File size breakdown - Detect if PDF is scanned or native text - Compression estimate ## Input Options ### Basic Input json { "operation": "split", "pdfUrl": "https://example.com/document.pdf" } ### Using Base64 Input json { "operation": "compress", "pdfBase64": "JVBERi0xLjcKCjEgMCBvYmoK..." } ## Operation Examples ### Get PDF Information json { "operation": "info", "pdfUrl": "https://example.com/document.pdf" } ### Split Into Individual Pages json { "operation": "split", "pdfUrl": "https://example.com/large-document.pdf", "splitMode": "each_page" } ### Split By Page Ranges json { "operation": "split", "pdfUrl": "https://example.com/document.pdf", "splitMode": "ranges", "ranges": ["1-10", "11-20", "21-30"] } ### Extract Specific Pages json { "operation": "split", "pdfUrl": "https://example.com/document.pdf", "splitMode": "extract", "pages": [1, 5, 10, 15] } ### Merge Multiple PDFs json { "operation": "merge", "pdfUrls": [ "https://example.com/part1.pdf", "https://example.com/part2.pdf", "https://example.com/part3.pdf" ] } ### Merge With Custom Order json { "operation": "merge", "pdfUrls": ["doc1.pdf", "doc2.pdf", "doc3.pdf"], "order": [2, 0, 1] } ### Compress PDF json { "operation": "compress", "pdfUrl": "https://example.com/large-file.pdf", "compressionPreset": "high" } ### Convert PDF to PNG Images json { "operation": "convert", "pdfUrl": "https://example.com/document.pdf", "outputFormat": "png", "dpi": 200, "quality": 95 } ### Convert Specific Pages to JPEG json { "operation": "convert", "pdfUrl": "https://example.com/document.pdf", "outputFormat": "jpeg", "pages": [1, 3, 5], "dpi": 150 } ### OCR - Extract Text from Scanned PDF json { "operation": "ocr", "pdfUrl": "https://example.com/scanned-document.pdf", "language": "eng", "preprocess": true } ### OCR in French json { "operation": "ocr", "pdfUrl": "https://example.com/french-scan.pdf", "language": "fra" } ### Enhance Scanned Document json { "operation": "enhance", "pdfUrl": "https://example.com/old-scan.pdf", "sharpen": true, "denoise": true, "contrast": 1.3, "brightness": 1.1 } ### Generate Thumbnail json { "operation": "thumbnail", "pdfUrl": "https://example.com/document.pdf", "thumbnailWidth": 300, "outputFormat": "png" } ### Remove Pages json { "operation": "merge", "pdfUrl": "https://example.com/document.pdf", "pagesToRemove": [2, 5, 8] } ### Reorder Pages json { "operation": "merge", "pdfUrl": "https://example.com/document.pdf", "newPageOrder": [4, 3, 2, 1, 5, 6] } ## Output Results are saved to the run's Key-Value Store for easy download: | Operation | Output Files | |-----------|-------------| | Split | page_001.pdf, page_002.pdf, ... or pages_1-10.pdf, etc. | | Merge | merged.pdf | | Compress | compressed.pdf | | Convert | page_001.png, page_002.png, ... | | OCR | extracted_text.txt + Dataset with per-page results | | Enhance | enhanced.pdf | | Thumbnail | thumbnail.png | ### Sample Output json { "operation": "compress", "preset": "high", "pageCount": 25, "originalSize": "4.5 MB", "compressedSize": "1.2 MB", "compressionRatio": "73.3%", "outputKey": "compressed.pdf" } ## Supported Languages for OCR | Code | Language | |------|----------| | eng | English | | fra | French | | deu | German | | spa | Spanish | | ita | Italian | | por | Portuguese | | nld | Dutch | | pol | Polish | | rus | Russian | | chi_sim | Chinese (Simplified) | | chi_tra | Chinese (Traditional) | | jpn | Japanese | | kor | Korean | | ara | Arabic | ## Compression Presets | Preset | Image Quality | Best For | |--------|--------------|----------| | low | 90% | Archives, legal documents | | medium | 75% | General use, email | | high | 50% | Web uploads, storage saving | | screen | 60% | On-screen viewing | | print | 85% | Print-quality output | ## Pricing | Event | Price | Description | |-------|-------|-------------| | pdf-loaded | $0.005 | Each PDF loaded from URL or base64 | | page-enhanced | $0.01 | Each page enhanced (sharpen, denoise) | | page-processed | $0.002 | Each page processed (split, merge, compress) | | ocr-page | $0.02 | Each page with OCR text extraction | | pdf-compressed | $0.01 | PDF compression completed | | page-converted | $0.005 | Each page converted to image | | pdf-merged | $0.01 | PDF merge operation completed | | metadata-extracted | $0.005 | PDF info/metadata extraction | | text-extracted | $0.005 | Text extraction completed | ## Use Cases - Invoice Processing - Extract data from scanned invoices using OCR - Document Splitting - Break down large reports into chapters - PDF Compression - Reduce file size for email attachments - Image Generation - Create thumbnails for document previews - Document Merging - Combine multiple contracts into one file - Archival - Enhance and OCR historical scanned documents - Web Publishing - Convert PDF pages to web-friendly images - Data Extraction - Pull text from non-searchable PDFs
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Pdf Power Tools now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- agenscrape
- Pricing
- Paid
- Total Runs
- 36
- Active Users
- 5
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support