PaddleOCR VL

by yeekal

67 runs
5 users
Try This Actor

Opens on Apify.com

About PaddleOCR VL

What does this actor do?

PaddleOCR VL is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Paddle OCR Layout Parser This Apify Actor provides a powerful interface to the Paddle OCR Layout Parsing API. It allows you to submit an image or a PDF file via a URL and receive structured Markdown content, with all embedded images correctly linked via their absolute URLs. It also provides a visual representation of the parsed layout. ## Features - Supports Images and PDFs: Process various image formats (PNG, JPG, etc.) and multi-page PDF documents. - Smart File Type Detection: Automatically determines the file type from the URL, or you can specify it manually. - Markdown Content Extraction: Extracts the full textual content and structure of the document into clean Markdown. - Layout Visualization: Provides a URL to an image that visually highlights the detected layout structure (titles, paragraphs, figures, tables). - File Size Limit: Protects against oversized files by enforcing a 5MB limit. ## Input The Actor requires the following inputs, which are defined in the Input tab. | Field | Type | Description | | --- | --- | --- | | File URL (fileUrl) | String | Required. A publicly accessible URL to the image or PDF file you want to process. The file size must not exceed 5MB. | | File Type (fileType) | String | The type of the file. It's recommended to leave this as Autodetect. Options: Autodetect, Image, PDF. | ## Output The Actor stores its results in the Apify default dataset. Each item in the dataset corresponds to a page from the input file. ### Output Structure (JSON) json [ { "pageNumber": 1, "processedMarkdown": "## This is the Title\n\nAnd this is a paragraph of text. Here is an image:\n\n<div style=\"text-align: center;\"><img src=\"https://example.com/path/to/image.jpg\" alt=\"Image\" width=\"50%\" /></div>", "layoutImageUrl": "https://example.com/path/to/layout_visualization.jpg", } ] - processedMarkdown: The primary output. Ready-to-render Markdown with absolute image URLs. - layoutImageUrl: A URL to an image visualizing the detected document layout.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try PaddleOCR VL now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
yeekal
Pricing
Paid
Total Runs
67
Active Users
5
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support