Broken Image Checker

Name: Broken Image Checker
Author: colorful_soup

by colorful_soup

Detect broken or missing images on any public webpage and get a clean, actionable report. Perfect for SEO professionals, webmasters, QA testers, and U...

16 runs

2 users

Try This Actor

Opens on Apify.com

About Broken Image Checker

Detect broken or missing images on any public webpage and get a clean, actionable report. Perfect for SEO professionals, webmasters, QA testers, and UX teams.

What does this actor do?

Broken Image Checker is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Broken Image Checker - Find Missing & Broken Images Across Multiple Pages Detect broken or missing images across multiple webpages in a single run and get clean, actionable reports. Perfect for SEO professionals, webmasters, QA testers, and UX teams running site-wide audits. Key Benefits: - Batch processing - Check hundreds of pages in one run - Fast detection using HEAD requests (10x faster than GET) - Sitemap integration - Works seamlessly with Sitemap Fetcher output - Accurate status codes and error messages - Aggregated reports with per-page breakdowns - Proxy support for geo-restricted content ## Why use this actor? Broken images hurt your SEO rankings, user experience, and brand credibility. Manual checking is time-consuming and error-prone, especially on large websites with hundreds of pages and thousands of images. Problems this solves: - Site-wide SEO audits - Scan your entire website to identify broken images that harm search rankings and waste crawl budget - Batch QA testing - Check hundreds of pages before production deployment to catch missing images from CMS migrations - Site health monitoring - Monitor multiple pages simultaneously to detect CDN failures and broken external image links - UX optimization at scale - Find loading errors across your entire site that frustrate users and hurt conversions ## Features - Batch processing - Check images across multiple pages in a single run (up to 10,000 pages) - Sitemap integration - Connect directly to Sitemap Fetcher output via dataset - Fast parallel checking - All images on each page checked simultaneously - HEAD requests first - 10x faster than GET, with automatic GET fallback - Sequential page processing - Prevents memory overload on large batches - Accurate error detection - HTTP status codes and detailed error messages - Aggregated reporting - Per-page breakdowns plus summary statistics - Graceful error handling - Continues processing even if individual pages fail - Handles relative and absolute URLs - Automatically converts to absolute - Proxy support - Works with Apify Proxy or custom proxies - Progress tracking - Real-time logging of page processing status ## How it works This actor processes your batch of URLs in the following steps: 1. Load URLs: Reads your list of URLs (manual input, file upload, or dataset from Sitemap Fetcher) 2. Process sequentially: Checks each page one at a time to avoid memory issues (respects maxPages limit) 3. Fetch HTML: Downloads the webpage HTML from each URL 4. Parse images: Extracts all `<img>` tags and converts relative URLs to absolute 5. Check availability: Tests each image with HEAD request (faster), falls back to GET if needed 6. Detect errors: Identifies broken images by HTTP status codes (404, 500, etc.) or request failures 7. Aggregate results: Combines per-page results with summary statistics (total broken images, pages affected) 8. Output report: Returns a comprehensive JSON report with per-page breakdowns and totals ## Input `json { "startUrls": [ { "url": "https://example.com" }, { "url": "https://example.com/products" }, { "url": "https://example.com/about" } ], "maxPages": 100, "timeoutMs": 8000, "debugLog": false }` ### Input parameters | Field | Type | Description | Required | Default | |-------|------|-------------|----------|---------| | `startUrls` | array | List of webpage URLs to scan for broken images. Supports manual entry, file upload, or dataset integration | Yes | - | | `maxPages` | integer | Maximum number of pages to check (1-10,000). Controls cost and runtime | No | 100 | | `timeoutMs` | integer | Timeout for HTTP requests in milliseconds (1000-30000) | No | 8000 | | `proxyConfiguration` | object | Proxy settings for requests (Apify Proxy or custom) | No | - | | `debugLog` | boolean | Enable detailed logging for troubleshooting | No | false | ### Connecting to Sitemap Fetcher You can pipe URLs directly from the Sitemap Fetcher actor: 1. Run the Sitemap Fetcher actor to extract URLs from your sitemap 2. In this actor's input, use the requestListSources editor 3. Connect the Sitemap Fetcher's dataset as the source 4. The actor will automatically extract URLs from the dataset ## Output The actor stores aggregated results in the default dataset: json { "pagesChecked": 3, "totalImages": 87, "brokenImagesByPage": [ { "pageUrl": "https://example.com", "imageCount": 25, "brokenImages": [], "checkedAt": "2025-12-13T10:30:00.000Z", "error": null }, { "pageUrl": "https://example.com/products", "imageCount": 42, "brokenImages": [ { "src": "https://example.com/missing.jpg", "status": 404, "error": null }, { "src": "https://cdn.example.com/timeout.png", "status": null, "error": "Request timeout" } ], "checkedAt": "2025-12-13T10:30:15.000Z", "error": null }, { "pageUrl": "https://example.com/about", "imageCount": 20, "brokenImages": [], "checkedAt": "2025-12-13T10:30:22.000Z", "error": null } ], "summary": { "totalBrokenImages": 2, "pagesWithBrokenImages": 1 }, "checkedAt": "2025-12-13T10:30:22.000Z" } ### Output fields | Field | Type | Description | |-------|------|-------------| | `pagesChecked` | integer | Total number of pages successfully checked | | `totalImages` | integer | Total images found across all pages | | `brokenImagesByPage` | array | Per-page results with broken images | | `brokenImagesByPage[].pageUrl` | string | The webpage URL that was scanned | | `brokenImagesByPage[].imageCount` | integer | Number of images found on this page | | `brokenImagesByPage[].brokenImages` | array | List of broken images on this page | | `brokenImagesByPage[].brokenImages[].src` | string | URL of the broken image | | `brokenImagesByPage[].brokenImages[].status` | integer/null | HTTP status code (404, 500, etc.) or null if request failed | | `brokenImagesByPage[].brokenImages[].error` | string/null | Error message if the request failed | | `brokenImagesByPage[].checkedAt` | string | ISO 8601 timestamp when this page was checked | | `brokenImagesByPage[].error` | string/null | Error message if the page failed to load | | `summary` | object | Aggregated statistics across all pages | | `summary.totalBrokenImages` | integer | Total number of broken images found | | `summary.pagesWithBrokenImages` | integer | Number of pages that have at least one broken image | | `checkedAt` | string | ISO 8601 timestamp when the run completed | ## Use cases This actor is perfect for: - Site-wide SEO audits: Combine with Sitemap Fetcher to scan your entire website (hundreds or thousands of pages) to find broken images that hurt search rankings and waste crawl budget. Get a complete report showing which pages have issues. - Pre-launch QA testing: Batch check all staging environment pages before deployment to catch missing images from CMS migrations, broken CDN links, or incorrect image paths across your entire site. - Site health monitoring: Set up scheduled runs with your sitemap to continuously monitor hundreds of pages simultaneously, detecting CDN failures, expired external image links, or accidental deletions in real-time. - CRO/UX optimization at scale: Identify image loading errors across your entire site that frustrate users, increase bounce rates, and hurt conversion rates. Get summary statistics to prioritize fixes. - Client reporting: Generate comprehensive reports showing broken images across multiple client websites or sections, with aggregated statistics perfect for client presentations. ## Proxy configuration This actor supports both Apify Proxy and custom HTTP/HTTPS/SOCKS proxies. ### Using Apify Proxy (Recommended) `json { "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"], "apifyProxyCountry": "US" } }` ### Using custom proxies `json { "proxyConfiguration": { "proxyUrls": [ "http://proxy.example.com:8000" ] } }` Proxies are useful when checking geo-restricted content or avoiding rate limits on high-traffic sites. ## ⚙️ Performance - Typical runtime: 5–10 seconds per page for pages with ~50 images - Batch processing: Checks pages sequentially to avoid memory issues - Runs efficiently across hundreds or thousands of pages - Actual performance varies based on: - Number of pages in your batch (controlled by `maxPages`) - Number of images per page - Image server/CDN response times - Network latency to target servers - Timeout settings - Proxy configuration used Tip: Start with `maxPages: 10` to test your URLs, then scale to 100, 1000, or more as needed. ## Error handling This actor includes robust error handling: - Page-level resilience: If one page fails to load, the actor continues processing remaining pages - Automatic retries: Failed requests are retried with exponential backoff - HEAD/GET fallback: If HEAD requests fail, the actor automatically tries GET requests - Detailed logging: All errors are logged with context, including page progress ("Processing page 5 of 100") - Error reporting: Failed pages are included in output with error details for troubleshooting - Graceful failure: Successfully processed pages are reported even if some pages fail - Timeout handling: Configurable timeouts prevent hanging on slow servers - URL validation: Invalid URLs are logged and skipped rather than crashing the run ## Limitations - Only checks publicly accessible webpages (no authentication support) - Maximum 10,000 pages per run (controlled by `maxPages`) - Maximum timeout of 30 seconds per image request - Sequential page processing (not parallel) to avoid memory issues - JavaScript-rendered images require the page to already have rendered HTML (consider using Playwright for dynamic sites) - Does not follow pagination or crawl dynamically - provide all URLs via `startUrls` or connect a dataset ## Tips for best results 1. Start small, then scale: Test with `maxPages: 10` first to verify your URLs work, then increase to 100, 1000, etc. 2. Combine with Sitemap Fetcher: Use the Sitemap Fetcher actor first to get all your site URLs, then connect its dataset to this actor for comprehensive coverage 3. Use appropriate timeouts: Increase `timeoutMs` to 15000+ for slow CDNs or international servers 4. Enable debug logging: Set `debugLog: true` when troubleshooting to see detailed per-page progress 5. Use proxies for geo-content: Some CDNs serve different images based on location - use residential proxies to test from specific countries 6. Monitor summary statistics: Check `summary.totalBrokenImages` and `summary.pagesWithBrokenImages` for quick insights before diving into per-page details 7. Schedule regular runs: Set up scheduled runs to monitor site health continuously across all your important pages ## Related actors Check out these related actors for comprehensive site auditing: - URL Canonicalizer + Redirect Resolver: Check for redirect chains and canonical URL issues - Sitemap Fetcher + Page Title Extractor: Analyze your sitemap and page metadata - URL Metadata Extractor: Extract Open Graph images and metadata from multiple pages ## Support & feedback Need help or have suggestions? - Issues: Create an issue in the GitHub repository - Email: Contact through Apify platform messaging ## Changelog ### Version 1.0.9 (2025-12-13) - Batch processing support - Check multiple pages in a single run (up to 10,000 pages) - Sitemap integration - requestListSources editor supports dataset connections - Aggregated reporting - Per-page results with summary statistics - Graceful error handling - Continues processing even if individual pages fail - Progress logging - Real-time page processing status - Added `startUrls` array input (replaces single `url`) - Added `maxPages` limit for cost control ### Version 1.0.0 (2025-12-12) - Initial release - Fast parallel image checking with HEAD/GET fallback - Support for relative and absolute URLs - Proxy configuration support - Comprehensive error reporting --- Made with care for the web development community Part of the Apify Actor Portfolio collection

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Broken Image Checker now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: colorful_soup
Pricing: Paid
Total Runs: 16
Active Users: 2

Related Actors

Google Search Results Scraper

by apify

Google Search Results (SERP) Scraper

by scraperlink

Google Search

by devisty

Bing Search Scraper

by tri_angle

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support