PaddleOCR VL

Name: PaddleOCR VL
Author: yeekal

by yeekal

67 runs

5 users

Try This Actor

Opens on Apify.com

About PaddleOCR VL

What does this actor do?

PaddleOCR VL is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Paddle OCR Layout Parser This Apify Actor provides a powerful interface to the Paddle OCR Layout Parsing API. It allows you to submit an image or a PDF file via a URL and receive structured Markdown content, with all embedded images correctly linked via their absolute URLs. It also provides a visual representation of the parsed layout. ## Features - Supports Images and PDFs: Process various image formats (PNG, JPG, etc.) and multi-page PDF documents. - Smart File Type Detection: Automatically determines the file type from the URL, or you can specify it manually. - Markdown Content Extraction: Extracts the full textual content and structure of the document into clean Markdown. - Layout Visualization: Provides a URL to an image that visually highlights the detected layout structure (titles, paragraphs, figures, tables). - File Size Limit: Protects against oversized files by enforcing a 5MB limit. ## Input The Actor requires the following inputs, which are defined in the `Input` tab. | Field | Type | Description | | --- | --- | --- | | File URL (`fileUrl`) | String | Required. A publicly accessible URL to the image or PDF file you want to process. The file size must not exceed 5MB. | | File Type (`fileType`) | String | The type of the file. It's recommended to leave this as `Autodetect`. Options: `Autodetect`, `Image`, `PDF`. | ## Output The Actor stores its results in the Apify default dataset. Each item in the dataset corresponds to a page from the input file. ### Output Structure (JSON) `json [ { "pageNumber": 1, "processedMarkdown": "## This is the Title\n\nAnd this is a paragraph of text. Here is an image:\n\n<div style=\"text-align: center;\"><img src=\"https://example.com/path/to/image.jpg\" alt=\"Image\" width=\"50%\" /></div>", "layoutImageUrl": "https://example.com/path/to/layout_visualization.jpg", } ]` - `processedMarkdown`: The primary output. Ready-to-render Markdown with absolute image URLs. - `layoutImageUrl`: A URL to an image visualizing the detected document layout.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try PaddleOCR VL now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: yeekal
Pricing: Paid
Total Runs: 67
Active Users: 5

Related Actors

Google Search Results Scraper

by apify

Website Content Crawler

by apify

🔥 Leads Generator - $3/1k 50k leads like Apollo

by microworlds

Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.

by invideoiq

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support