PDF AI Extractor MCP
by devaditya
Extract text, tables, and structured data from any PDF using AI. Supports bulk processing, clean JSON exports, and MCP mode for AI agents.
Opens on Apify.com
About PDF AI Extractor MCP
Tired of manually pulling data from PDFs or wrestling with unreliable parsers? I built this MCP server to solve that. The PDF AI Extractor MCP feeds your PDFs directly to top AI models—OpenAI, Google Gemini, or Claude—to get the actual *understanding* you need. It pulls out clean text, accurately reconstructs tables, and can even generate concise summaries or structured JSON on demand. I use it constantly for processing research papers, financial reports, and contracts where simple text extraction just isn't enough. You can run it on single files or set up bulk processing for entire folders, which saves hours. The structured JSON output is ready to pipe into your databases or applications without cleanup. And because it's built as an MCP server, it integrates directly into AI agent and assistant workflows, making the data instantly available for analysis or decision-making. It turns static documents into a dynamic, queryable data source.
What does this actor do?
PDF AI Extractor MCP is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
PDF AI Extractor MCP
A dual-mode Apify Actor that downloads PDFs, extracts clean text, and analyzes the content using major AI models. It can run as a one-time job or as a persistent MCP (Model Context Protocol) server for AI agents.
Overview
This actor solves the common problem of extracting and analyzing unstructured data from PDFs. It reliably downloads a PDF, parses and cleans the text, and then sends that text to your chosen AI provider (OpenAI, Google Gemini, or Anthropic) for analysis based on your prompt. It operates in two distinct modes: a standard "normal" mode for single-run automation, and an "MCP" mode where it functions as a WebSocket server, allowing AI agents like those in ChatGPT or Claude to call its functions as tools.
Key Features
- Robust PDF Extraction: Downloads PDFs from a public URL and uses
pdf-parseto extract and clean the text. - Multi-AI Support: Connects to OpenAI (GPT-4, GPT-4o, o3-mini), Google Gemini (1.5 Flash/Pro), or Anthropic Claude (Haiku, Sonnet, Opus) APIs.
- Dual Operation Modes:
- Normal Mode: Runs a single extraction and analysis job, returning structured JSON. Ideal for automated backend workflows.
- MCP Mode: Launches a WebSocket server that exposes tools (e.g.,
extractPdf(),analyze()) to AI agents via the MCP standard.
- Structured Output: Returns the AI's analysis in a consistent JSON format.
How to Use
Prerequisites
Set the required environment variables for your chosen AI provider(s) in a .env file:
OPENAI_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
ANTHROPIC_API_KEY=your_key_here
MCP_PORT=8080 # Optional, for MCP mode
Running the Actor
Normal Mode (One-time job):
Provide an input object specifying "mode": "normal", the PDF URL, AI provider, and your analysis prompt.
apify run --purge --input-file=tests/input.normal.json
MCP Mode (Agent Server):
Provide an input object specifying "mode": "mcp". The actor will start a WebSocket server (default: ws://localhost:8080) that AI agents can connect to.
apify run --purge --input-file=tests/input.mcp.json
Input / Output
Input Schema (Normal Mode)
| Field | Description | Required |
|---|---|---|
mode |
Set to "normal" or "mcp". |
Yes |
pdfUrl |
Publicly accessible URL of the PDF to process. | Yes (for normal mode) |
aiProvider |
AI service: "openai", "google", or "anthropic". |
Yes (for normal mode) |
prompt |
Instructions for the AI model to analyze the extracted text. | Yes (for normal mode) |
Example Input (Normal Mode):
{
"mode": "normal",
"pdfUrl": "https://example.com/document.pdf",
"aiProvider": "openai",
"prompt": "Summarize the key points from this contract."
}
Example Input (MCP Mode):
{
"mode": "mcp"
}
Output Format (Normal Mode)
The actor returns a JSON object containing the execution details and the AI's result.
{
"mode": "normal",
"aiProvider": "openai",
"pdfUrl": "https://example.com/document.pdf",
"charactersExtracted": 12500,
"aiResult": "The AI-generated summary and analysis appears here..."
}
In MCP mode, results are streamed directly to the connected AI agent via the WebSocket connection.
Project Structure
The main logic is in main.js. Key modules within /src include:
* orchestrator/: Manages the execution flow.
* connectors/: Contains adapters for each AI provider (OpenAI, Google, Anthropic).
* mcp/: Houses the WebSocket server and tool handlers for MCP mode.
* utils/: Provides utilities for PDF processing, AI calls, and file management.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try PDF AI Extractor MCP now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- devaditya
- Pricing
- Paid
- Total Runs
- 13
- Active Users
- 2
Related Actors
Fast Website Content Crawler
by 6sigmag
Domain Availability, Expiry, WHOIS, DNS, IP, ASN, 70+ TLD
by datascoutapi
🧾 YouTube Extractor (Transcripts + Metadata)
by dz_omar
Email Verifier by Million Verifier - $1/1k emails
by account56
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support