Markitdown Mcp Server
by rector_labs
Cloud-hosted MCP server converting 29+ document formats (PDF, DOCX, PPTX, images, audio) to AI-ready Markdown. Zero Python setup. Perfect for RAG pipe...
Opens on Apify.com
About Markitdown Mcp Server
Cloud-hosted MCP server converting 29+ document formats (PDF, DOCX, PPTX, images, audio) to AI-ready Markdown. Zero Python setup. Perfect for RAG pipelines and AI agents. Pay-per-use: $0.02/conversion. Built on Microsoft's Markitdown (82k+ ⭐).
What does this actor do?
Markitdown Mcp Server is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Markitdown MCP Server ⚡ > Convert any document to AI-ready Markdown in seconds > Cloud-hosted Model Context Protocol server powered by Microsoft's Markitdown
--- ## 🎯 What is This? Markitdown MCP Server is a cloud-hosted service that converts documents into clean, AI-optimized Markdown. Built on Microsoft's Markitdown library (82k+ ⭐), it eliminates the need for local Python installations and provides instant, scalable document conversion through the Model Context Protocol. Perfect for RAG pipelines, knowledge bases, AI agents, and document processing workflows. --- ## ✨ Key Features ### 🚀 Universal Format Support Convert 29+ file formats to clean Markdown: - Documents: PDF, DOCX, PPTX, XLSX - Images: PNG, JPG, GIF (with OCR) - Web: HTML, XML - Audio: MP3, WAV (with transcription) - Archives: ZIP (extract and convert contents) - And many more! ### ☁️ Zero Setup Required - No Python installation needed - No dependency management - No local configuration - Just call the API and get Markdown ### 🎭 MCP Native - First-class Model Context Protocol support - Works seamlessly with Claude Desktop, Cursor, Aider - AI agents can discover and use it automatically ### ⚡ Lightning Fast - Direct Python library integration (no subprocess overhead) - Typical conversion: < 3 seconds - Cloud-scale infrastructure via Apify ### 💰 Pay-Per-Use - $0.01 per Actor start - $0.02 per document conversion - No subscriptions, no minimums --- ## 🎬 Quick Start > 📖 Full Installation Guide - Complete setup for Claude Code CLI, Claude Desktop, Cursor, VS Code, and more ### Claude Code CLI (Recommended) bash # Add the server with one command claude mcp add --transport http markitdown \ https://api.apify.com/v2/acts/rector_labs~markitdown-mcp-server/mcp/latest # Authenticate (opens browser for OAuth) /mcp Then in Claude Code: Convert this PDF to markdown: https://example.com/document.pdf ### Claude Desktop macOS: Edit ~/Library/Application Support/Claude/claude_desktop_config.json Windows: Edit %APPDATA%\Claude\claude_desktop_config.json json { "mcpServers": { "markitdown": { "url": "https://api.apify.com/v2/acts/rector_labs~markitdown-mcp-server/mcp/latest", "transport": { "type": "http", "headers": { "Authorization": "Bearer YOUR_APIFY_TOKEN" } } } } } Restart Claude Desktop and start converting! ### Cursor IDE 1. Open Settings → MCP Servers 2. Click Add new MCP server 3. Paste configuration (see INSTALLATION.md) 4. Enable and look for green dot ✅ ### Get Your Apify Token 1. Sign up at apify.com (free tier available) 2. Go to Settings → Integrations 3. Copy your API Token 📖 View detailed installation guides for all clients → --- ### For Developers (API) #### Direct HTTP Request bash curl -X POST https://api.apify.com/v2/acts/rector_labs~markitdown-mcp-server/runs \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "fileUrl": "https://example.com/document.pdf" }' #### Python Example python from apify_client import ApifyClient client = ApifyClient('YOUR_API_TOKEN') run = client.actor('rector_labs/markitdown-mcp-server').call( run_input={ 'fileUrl': 'https://example.com/document.pdf' } ) # Get markdown output for item in client.dataset(run['defaultDatasetId']).iterate_items(): print(item['markdown']) #### JavaScript/TypeScript Example typescript import { ApifyClient } from 'apify-client'; const client = new ApifyClient({ token: 'YOUR_API_TOKEN' }); const run = await client.actor('rector_labs/markitdown-mcp-server').call({ fileUrl: 'https://example.com/document.pdf' }); // Get markdown output const { items } = await client.dataset(run.defaultDatasetId).listItems(); console.log(items[0].markdown); --- ## 📚 Supported Formats ### Documents & Spreadsheets | Format | Extension | Notes | |--------|-----------|-------| | PDF | .pdf | Text extraction, OCR support | | Word | .docx, .doc | Preserves formatting | | PowerPoint | .pptx, .ppt | Slide text extraction | | Excel | .xlsx, .xls | Table to Markdown | | CSV | .csv | Table formatting | | TSV | .tsv | Table formatting | ### Images | Format | Extension | Notes | |--------|-----------|-------| | PNG | .png | OCR text extraction | | JPEG | .jpg, .jpeg | OCR text extraction | | GIF | .gif | OCR text extraction | | BMP | .bmp | OCR text extraction | ### Web & Markup | Format | Extension | Notes | |--------|-----------|-------| | HTML | .html, .htm | Clean conversion | | XML | .xml | Structured data | | Markdown | .md | Pass-through | ### Audio & Video | Format | Extension | Notes | |--------|-----------|-------| | MP3 | .mp3 | Speech-to-text transcription | | WAV | .wav | Speech-to-text transcription | | YouTube | URLs | Transcript extraction | ### Archives | Format | Extension | Notes | |--------|-----------|-------| | ZIP | .zip | Extract and convert contents | --- ## 💡 Use Cases ### 🤖 RAG Pipelines PDF Documents → Markitdown → Clean Markdown → Vector DB → LLM Perfect for preparing documents for semantic search and retrieval. ### 📖 Knowledge Base Migration Convert legacy documentation (PDFs, Word docs) to modern Markdown format for wikis, documentation sites, or content management systems. ### 🎓 Research & Academia Extract text from research papers, presentations, and datasets for analysis and processing. ### 📊 Data Extraction Convert invoices, reports, and spreadsheets into structured Markdown for further processing. ### 🔄 Batch Processing Process hundreds of documents in parallel using Apify's infrastructure. --- ## 🔌 Integrations ### MCP Clients Supported clients: - ✅ Claude Code CLI - Native HTTP transport with OAuth - ✅ Claude Desktop - JSON configuration - ✅ Cursor IDE - UI-based installation - ✅ VS Code - Via MCP extensions - ✅ Other MCP clients - Windsurf, Zed, etc. 📖 View detailed setup guides → ### Workflow Automation ### n8n Workflow 1. Add Apify node 2. Select Markitdown MCP Server actor 3. Configure file URL input 4. Connect to downstream nodes ### Make.com (Integromat) 1. Add Apify module 2. Select actor: rector_labs/markitdown-mcp-server 3. Map file URL from trigger 4. Use output in next steps ### Zapier 1. Choose Apify app 2. Action: Run Actor 3. Actor: markitdown-mcp-server 4. Map data from previous steps --- ## ⚙️ Configuration ### Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | fileUrl | string | ✅ (or base64) | URL of the document to convert | | fileBase64 | string | ✅ (or URL) | Base64-encoded file content | Note: Provide either fileUrl or fileBase64, not both. ### Example Inputs URL-based: json { "fileUrl": "https://example.com/document.pdf" } Base64-based: json { "fileBase64": "JVBERi0xLjQKJeLjz9MKMyAwIG9iago8PC..." } --- ## 📊 Output Format The actor outputs clean Markdown text with metadata: json { "event": "conversion_success", "file_size": 153600, "markdown_length": 5234, "file_type": ".pdf" } The Markdown content is returned as the tool response. --- ## 💲 Pricing ### Pay-Per-Event Model | Event | Price | Description | |-------|-------|-------------| | Actor Start | $0.01 | One-time fee per Actor run | | Document Conversion | $0.02 | Per successful conversion | ### Example Costs - Single document: $0.03 total ($0.01 start + $0.02 conversion) - 100 documents: ~$2.10 ($0.01 start + $2.00 conversions) - 1,000 documents: ~$20.10 ($0.01 start + $20.00 conversions) No subscriptions. No minimums. Pay only for what you use. --- ## 🚀 Performance | Metric | Value | |--------|-------| | Average conversion time | < 3 seconds | | Small files (< 1MB) | < 2 seconds | | Large files (10MB+) | < 10 seconds | | Concurrent processing | Unlimited (cloud-scaled) | | Uptime | 99.95% (Apify SLA) | --- ## 🛠️ Advanced Features ### Error Handling The actor gracefully handles: - Invalid file URLs (404, network errors) - Unsupported file formats (clear error messages) - Corrupted files (validation before processing) - Large files (automatic timeout handling) ### Logging & Debugging All conversions are logged with: - File type and size - Conversion duration - Success/failure status - Error details (if any) ### Custom Options Coming soon: - Azure Document Intelligence integration - OpenAI image description - Custom OCR settings - Batch processing mode --- ## 🔒 Security & Privacy - No data retention: Files are processed and immediately deleted - Encrypted transport: All transfers use HTTPS - Isolated execution: Each conversion runs in a sandboxed container - No logging of content: Only metadata is logged - GDPR compliant: Hosted on Apify's secure infrastructure --- ## ❓ FAQ ### Q: What's the difference between this and running Markitdown locally? A: This is a cloud-hosted service with: - ✅ No Python installation required - ✅ No dependency management - ✅ Automatic scaling for batch processing - ✅ MCP integration for AI agents - ✅ 99.95% uptime guarantee - ✅ Pay-per-use (no server costs) ### Q: Can I convert password-protected PDFs? A: Not currently. Password-protected documents will return an error. Remove protection before conversion. ### Q: What's the maximum file size? A: 100 MB hard limit. Files over 50 MB may take longer to process. For larger files, consider splitting them first. ### Q: Does it work with scanned PDFs (images)? A: Yes! OCR (Optical Character Recognition) is supported for image-based PDFs and image files. ### Q: Can I use this in production? A: Absolutely! The actor runs on Apify's production infrastructure with 99.95% uptime SLA. ### Q: How accurate is the Markdown output? A: Markitdown preserves: - ✅ Headings and structure - ✅ Bold and italic formatting - ✅ Lists (ordered and unordered) - ✅ Tables - ✅ Links - ✅ Code blocks Complex layouts may need manual review. ### Q: Can I convert multiple files at once? A: Yes! Run multiple Actor instances in parallel, or use batch mode (contact for enterprise pricing). --- ## 🐛 Troubleshooting ### "File download failed: HTTP 404" Cause: The URL is invalid or the file doesn't exist. Solution: - Verify the URL is correct and publicly accessible - Ensure the file hasn't been deleted or moved - Check for authentication requirements ### "Unsupported file format" Cause: The file extension is not in the supported formats list. Solution: - Check the Supported Formats section - Convert the file to a supported format first - Contact support if you need a specific format added ### "Conversion timeout" Cause: The file is too large or complex. Solution: - Split large files into smaller chunks - Simplify complex documents - Increase timeout (contact support for enterprise plans) ### "Invalid base64 content" Cause: The base64 string is malformed or incomplete. Solution: - Verify base64 encoding is correct - Ensure no truncation occurred during transfer - Use fileUrl instead if possible --- ## 📖 Documentation - MCP Protocol: modelcontextprotocol.io - Microsoft Markitdown: github.com/microsoft/markitdown - Apify Platform: docs.apify.com - Python SDK: docs.apify.com/sdk/python --- ## 🤝 Support ### Need Help? - 📧 Email: support@apify.com - 💬 Discord: apify.com/discord - 📚 Documentation: docs.apify.com - 🐛 Bug Reports: GitHub Issues ### Community - ⭐ Star on GitHub: RECTOR-LABS/markitdown-mcp-server - 🐦 Follow Updates: @apify - 💡 Feature Requests: Open a GitHub issue --- ## 🚀 Get Started Now ### Deploy to Apify 1. Log in to Apify bash apify login 2. Deploy the Actor bash apify push 3. Enable Standby Mode Go to Actor settings and enable standby mode. 4. Get Your Actor URL Your MCP endpoint will be: https://rector-labs--markitdown-mcp-server.apify.actor/mcp 5. Connect AI Agents Add the endpoint to Claude Desktop, Cursor, or your favorite MCP client! --- ## 📜 License This project is built on: - Microsoft Markitdown: MIT License - Apify SDK: Apache 2.0 License - MCP SDK: MIT License Actor code: MIT License --- ## 🙏 Credits Built with: - Microsoft Markitdown - Document conversion library (82k+ ⭐) - Apify Platform - Serverless cloud infrastructure - MCP Protocol - AI agent integration standard ---
Made with ❤️ for the AI developer community
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Markitdown Mcp Server now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- rector_labs
- Pricing
- Paid
- Total Runs
- 167
- Active Users
- 3
Related Actors
Google Search Results Scraper
by apify
Website Content Crawler
by apify
🔥 Leads Generator - $3/1k 50k leads like Apollo
by microworlds
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support