Image Text Extractor
by m3web
Extract text from images using OCR (Optical Character Recognition) via direct URLs or uploaded JSON/CSV files. Works with multiple languages and autom...
Opens on Apify.com
About Image Text Extractor
Extract text from images using OCR (Optical Character Recognition) via direct URLs or uploaded JSON/CSV files. Works with multiple languages and automatically enriches your structured file with the text found inside images.
What does this actor do?
Image Text Extractor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
πΌοΈ Image Text Extractor Extract text from images using OCR (Optical Character Recognition) via direct URLs or uploaded JSON/CSV files. Works with multiple languages and automatically enriches your structured file with the text found inside images. --- ## β
Features - Accepts image URLs either: - Directly through startUrls, or - From uploaded .json or .csv files - Applies OCR (Optical Character Recognition) to each image and extracts: - extractedText: Full raw text detected - paragraphs: Text split into readable blocks - urls: Any links found within the image text - Supports Tesseract OCR with multiple languages (e.g. English, German, Spanish, etc.) - Saves results in Apify Key-Value Store with a shareable download link - Logs are clean and easy to follow --- ## π₯ Input This Actor accepts these input fields: | Field | Type | Description | |-------------------|----------|-----------------------------------------------------------------------------| | Image URLs | array | (Optional) One or more direct image URLs to process | | Upload a structured file| file | (Optional) Upload a .json or .csv file that contains image URLs | | Field name for image URL | string | The name of the column or field in your file that holds the image URLs | | language | string | Choose the OCR language from the dropdown (default is English) | ### π Explaining Field name for image URL in simple terms If you're uploading a .json or .csv file, you need to tell the Actor which part of each item contains the image URL. This is what the Field name for image URL is for: - π’ In a CSV file, each column has a name (like "image_url" or "photo"). You should type in the exact column name where the image URL is located. - Example: csv title,image_url Product 1,https://example.com/image1.jpg Product 2,https://example.com/image2.jpg In this case, you'd set Field name for image URL to image_url. - π§± In a JSON file, each object has a label for its fields. You need to write the name of the field that stores the image link. - Example: json [ { "name": "Item A", "photo": "https://example.com/photo.jpg" } ] Here, you'd set Field name for image URL to photo. - π¬ You can also use dot notation to reach inside nested fields. For example, if your JSON file looks like this: - Example: json [ { "assets": { "image": "https://example.com/image.jpg" } } ] Then set Field name for image URL to assets.image. - π’ Multiple Images in One Row If your .json or .csv file contains more than one image URL per item, you can still process them all! Simply point to the field that holds an array of URLs. - Example .json input: json { "title": "Product Set", "images": [ "https://example.com/photo1.jpg", "https://example.com/photo2.jpg" ] } Set Field name for image URL to images β the Actor will automatically process all image URLs inside that array. This also works with dot notation for nested arrays: json { "media": { "photos": [ "https://example.com/one.jpg", "https://example.com/two.jpg" ] } } In this case, set Field name for image URL to media.photos --- ### π OCR Language The Actor supports many languages beyond English. At the input step, you'll see a dropdown menu labeled language. Select the appropriate language for your images (e.g. German, French, Spanish...) - the default language is English. This helps the OCR engine correctly detect and read the text in your image. --- ## π€ Output After processing, you'll receive: 1. A structured CSV or JSON file with enriched data: - extractedText: All text found in each image - paragraphs: Text broken into readable chunks - urls: Any links found inside the image text 2. π A downloadable link to your processed file saved in Apify's Key-Value Store 3. π OCR results also pushed to Apify Dataset (optional) --- ## π Example Use Cases - Extracting text from screenshot-based Google Ads - Enriching scraped product data with visible text - Identifying links or CTAs from image banners --- ## π€ Behind the Scenes This Actor uses: - Tesseract.js for OCR - Sharp for image preprocessing (grayscale, normalize) - Support for both in-memory JSON and CSV parsing/stringifying - Output is clean and downloadable, with clear logs and no clutter ## π‘ Tip Want to extract thousands of image ads from Googleβs Ad Transparency Center? Combine this with a crawler that scrapes adstransparency.google.com, then feed that structured JSON into this Actor. Boom β text from image ads, at scale.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Image Text Extractor now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- m3web
- Pricing
- Paid
- Total Runs
- 407
- Active Users
- 34
Related Actors
Google Search Results Scraper
by apify
Website Content Crawler
by apify
π₯ Leads Generator - $3/1k 50k leads like Apollo
by microworlds
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support