PDF Extractor 2.0
by jupri
💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.
Opens on Apify.com
About PDF Extractor 2.0
💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.
What does this actor do?
PDF Extractor 2.0 is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
# Welcome to PDF Extractor
## 🍂 About PDF Format Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.[4] PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020. ## 🍂 About This Actor 💫 Extract contents from PDF documents ### Features : - ⭐ Extract PDF pages as Text or Image (SVG, PNG, JPEG). - ⭐ Extract PDF Metadata. - ⭐ Extract PDF Table of Contents - ⭐ Extract PDF Tables - ⭐ Extract Encrypted PDF (password protected) - ⭐ Extract Embedded images. - ⭐ Extract Attachments. - ⭐ Extract multiple URL files ## 🍂 Tutorial ### Input Parameters | Name | Type | Description | |-|-|-| |
url | Array [String] | List of PDF document URL | | content | String | Output pages format (text, svg, png, jpg) | | images | Boolean (true/false) | Extract embedded images | | attachments | Boolean (true/false) | Extract embedded files | | tables | Boolean (true/false) | Extract tables | > Notes : > All extracted resources other than TEXT will be saved to default Key-Value storage. ### Dataset Output Format : yaml [ # URL-1: Metadata { "metadata": { "headers": { ... }, "url": "...", "mime": "..." } }, # URL-1: Page Contents { "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] }, { "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] }, ... # URL-2: Metadata { "metadata": { "headers": { ... }, "url": "...", "mime": "..." } }, # URL-2: Page Contents { "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] }, { "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] }, ... ] ## 🍂 Output Samples ### PDF Sample #1 URL : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf yaml { } ### PDF Sample #2 URL : https://apify.com/img/web-scraping/beginners-guide-to-web-scraping.pdf yaml { } ## ✏️ Support ⚡️ Feel free to reach out to the developer for any issues or suggestions for improvement. 
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try PDF Extractor 2.0 now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- jupri
- Pricing
- Paid
- Total Runs
- 1,536
- Active Users
- 148
Related Actors
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Linkedin Profile Details Scraper + EMAIL (No Cookies Required)
by apimaestro
Twitter (X.com) Scraper Unlimited: No Limits
by apidojo
Content Checker
by jakubbalada
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support