Markitdown MCP Actor
by amaranth_nylon
Markitdown MCP Actor is an Apify Actor designed to convert various file formats (like PDFs, DOCX, PPTX, HTML, or images) into clean Markdown (.md) tex...
Opens on Apify.com
About Markitdown MCP Actor
Markitdown MCP Actor is an Apify Actor designed to convert various file formats (like PDFs, DOCX, PPTX, HTML, or images) into clean Markdown (.md) text.
What does this actor do?
Markitdown MCP Actor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Markitdown MCP Server Actor
An Apify Actor that provides a Model Context Protocol (MCP) server for Markitdown, Microsoft's lightweight Python utility for converting various file formats to Markdown. ## Overview This Actor wraps the Markitdown MCP server as an Apify Actor, making it easy to deploy and use as a pay-per-event service. Markitdown converts various file formats to Markdown, making them suitable for use with Large Language Models (LLMs) and text analysis pipelines. ### Supported File Formats - Documents: PDF, PowerPoint, Word, Excel - Media: Images (with EXIF metadata and OCR), Audio (with transcription) - Web Content: HTML, YouTube URLs - Data Formats: CSV, JSON, XML - Archives: ZIP files, EPubs ## Features - MCP Protocol: Implements the Model Context Protocol for seamless integration with MCP clients - Streamable HTTP Transport: Uses modern Streamable HTTP transport for efficient communication - Pay-per-Event: Charges only for actual tool usage via Apify's pay-per-event system - Session Management: Automatic session timeout and cleanup after inactivity - Standby Mode: Runs in Apify's standby mode for instant availability ## How It Works The Actor runs a proxy server that: 1. Connects to the Markitdown MCP server via STDIO transport 2. Exposes a Streamable HTTP endpoint at /mcp 3. Forwards MCP requests/responses between clients and the Markitdown server 4. Charges for tool usage via Apify's pay-per-event system ## Available Tools ### convert_to_markdown Converts various file formats to Markdown. Parameters: - uri (string): The URI of the file to convert. Supports: - http:// and https:// - Remote files - file:// - Local files - data: - Data URIs Example: json { "uri": "https://example.com/document.pdf" } ## Usage ### 1. Deploy the Actor Deploy this Actor to Apify or run it locally in standby mode. ### 2. Configure Your MCP Client Add the following configuration to your MCP client (e.g., VS Code, Claude Desktop): json { "mcpServers": { "markitdown-mcp-server": { "type": "http", "url": "https://YOUR_ACTOR_URL/mcp", "headers": { "Authorization": "Bearer YOUR_APIFY_TOKEN" } } } } ### 3. Use the Tool Once configured, you can use the convert_to_markdown tool in your MCP client: Convert this PDF to markdown: https://example.com/document.pdf ## Environment Variables - SESSION_TIMEOUT_SECS (default: 300): Session timeout in seconds before terminating idle sessions ## Pricing The Actor uses Apify's pay-per-event system with the following rates (configurable in .actor/pay_per_event.json): - CONVERT_TO_MARKDOWN: $0.01 per conversion - TOOL_LIST: $0.0001 per listing - RESOURCE_LIST: $0.0001 per listing - RESOURCE_READ: $0.001 per read - PROMPT_LIST: $0.0001 per listing - PROMPT_GET: $0.001 per get ## Local Development ### Prerequisites - Python 3.10 or higher - Poetry for dependency management - Docker (optional, for containerized development) ### Installation 1. Clone the repository: bash git clone https://github.com/Yash-Kavaiya/Markitdown-MCP-actor.git cd Markitdown-MCP-actor 2. Install dependencies: bash poetry install 3. Run the Actor locally: bash poetry run python -m src ### Running in Standby Mode To run the Actor in standby mode (required for MCP server operation): bash export APIFY_META_ORIGIN=STANDBY export ACTOR_STANDBY_URL=http://localhost:5001 poetry run python -m src The MCP endpoint will be available at: http://localhost:5001/mcp ## Project Structure Markitdown-MCP-actor/ ├── .actor/ # Apify Actor configuration │ ├── actor.json # Actor metadata and settings │ ├── pay_per_event.json # Pricing configuration │ ├── Dockerfile # Docker image definition │ └── .actorignore # Files to exclude from build ├── src/ # Source code │ ├── __init__.py # Package initialization │ ├── __main__.py # Main entry point │ ├── const.py # Constants and configuration │ ├── models.py # Data models │ ├── server.py # ProxyServer implementation │ ├── mcp_gateway.py # MCP gateway logic │ └── event_store.py # Event store for session management ├── pyproject.toml # Python dependencies and settings ├── .gitignore # Git ignore rules └── README.md # This file ## Customization ### Charging Strategy You can customize the charging strategy by editing .actor/pay_per_event.json. The default configuration charges: - $0.01 per conversion (main operation) - Minimal charges for metadata operations (listing tools, resources, prompts) ### Session Timeout Adjust the SESSION_TIMEOUT_SECS environment variable to control how long sessions remain active during inactivity. The default is 300 seconds (5 minutes). ### Tool Whitelist The Actor uses a tool whitelist defined in src/const.py: python TOOL_WHITELIST = { 'convert_to_markdown': ('CONVERT_TO_MARKDOWN', 1), } You can add more tools if the underlying Markitdown MCP server exposes them. ## Architecture The Actor implements a proxy architecture: MCP Client (VS Code, Claude, etc.) ↓ (Streamable HTTP) Proxy Server (This Actor) ↓ (STDIO) Markitdown MCP Server ↓ Markitdown Library Key components: - ProxyServer: Manages HTTP server and session lifecycle - MCP Gateway: Proxies MCP requests and handles charging - Event Store: Maintains session history for resumability - Session Manager: Handles Streamable HTTP transport ## Related Links - Markitdown GitHub - Markitdown MCP PyPI - Model Context Protocol - Apify Platform - Apify MCP Documentation ## License Apache-2.0 ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## Support For issues and questions: - Open an issue on GitHub - Contact via Apify Console ## Acknowledgments - Microsoft Markitdown - The underlying conversion tool - Apify - Actor platform and infrastructure - MCP Proxy - Inspiration for the proxy implementation
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Markitdown MCP Actor now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- amaranth_nylon
- Pricing
- Paid
- Total Runs
- 28
- Active Users
- 3
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support