LD+JSON Schema scraper

LD+JSON Schema scraper

by pocesar

Extract all JSON-LD structured data from any webpage. Perfect for SEO audits, competitor analysis, and automating data collection from schema markup.

91,151 runs
400 users
Try This Actor

Opens on Apify.com

About LD+JSON Schema scraper

Ever need to quickly pull all the structured JSON-LD data from a website? I built this scraper for exactly that. It’s a straightforward automation that visits the URLs you provide and extracts every LD+JSON script tag it finds. Whether you're auditing a site's SEO markup, comparing schema implementations across competitors, or collecting rich data for analysis, this tool saves you the manual hassle of digging through page source code. It outputs clean, organized JSON, making it easy to see exactly what structured data a site is using—think product info, reviews, business details, or event listings. I use it regularly to check my own projects and reverse-engineer how other sites implement their schema. It’s open-source, so you can tweak it if you need to, and it fits right into an automation workflow. If your work involves SEO, data aggregation, or web development, having a dedicated tool for this specific task just makes life simpler.

What does this actor do?

LD+JSON Schema scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

LD+JSON Schema Scraper

Extracts structured data (LD+JSON) from web pages for automation, SEO analysis, or data integration.

Overview

This actor crawls specified web pages and parses all LD+JSON script tags (application/ld+json). It outputs the raw schema data in a structured JSON format, making it usable for analysis, monitoring, or feeding into other systems.

Key Features

  • Extracts All Schema Types: Captures any LD+JSON data, including but not limited to Article, Product, Organization, BreadcrumbList, and FAQPage.
  • Handles Dynamic Content: Uses a headless browser (Puppeteer) to execute JavaScript, ensuring schemas rendered client-side are captured.
  • Configurable Crawling: Set maximum crawl depth and pages to control the scope of your extraction.
  • Proxy Support: Built-in proxy rotation to help avoid blocks during larger scraping jobs.
  • Open Source: The code is publicly available for inspection and modification.

How to Use

Run the actor on the Apify platform. You can start it via the Apify Console, using the Apify API, or integrate it into workflows with other Apify actors.

Basic Input Configuration:
Configure the actor run by providing a JSON object with the following key parameters:

{
  "startUrls": [
    { "url": "https://example.com/page-with-schema" }
  ],
  "maxDepth": 1,
  "maxPages": 10
}

Input/Output

Input (Run Configuration):
* startUrls (Required): An array of one or more URLs to start scraping from.
* maxDepth: How many links deep to follow from the start URLs (0 = only start URLs). Default is 1.
* maxPages: Maximum number of pages to scrape. Default is 1000.

Output (Dataset Items):
Each item in the output dataset represents a scraped page and contains:
* url: The source page URL.
* schemas: An array of objects, each containing the parsed JSON-LD data found on that page.
* metadata: Information like the HTTP status code and request/response details.

Example Output Item:

{
  "url": "https://example.com/product",
  "schemas": [
    {
      "@context": "https://schema.org",
      "@type": "Product",
      "name": "Example Widget",
      "description": "A great widget."
    }
  ]
}

Find the actor and its details here: https://apify.com/your-actor-page?fpr=python_automation

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try LD+JSON Schema scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
pocesar
Pricing
Paid
Total Runs
91,151
Active Users
400
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support