LD+JSON Schema scraper
by pocesar
Extract all JSON-LD structured data from any webpage. Perfect for SEO audits, competitor analysis, and automating data collection from schema markup.
Opens on Apify.com
About LD+JSON Schema scraper
Ever need to quickly pull all the structured JSON-LD data from a website? I built this scraper for exactly that. It’s a straightforward automation that visits the URLs you provide and extracts every LD+JSON script tag it finds. Whether you're auditing a site's SEO markup, comparing schema implementations across competitors, or collecting rich data for analysis, this tool saves you the manual hassle of digging through page source code. It outputs clean, organized JSON, making it easy to see exactly what structured data a site is using—think product info, reviews, business details, or event listings. I use it regularly to check my own projects and reverse-engineer how other sites implement their schema. It’s open-source, so you can tweak it if you need to, and it fits right into an automation workflow. If your work involves SEO, data aggregation, or web development, having a dedicated tool for this specific task just makes life simpler.
What does this actor do?
LD+JSON Schema scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
LD+JSON Schema Scraper
Extracts structured data (LD+JSON) from web pages for automation, SEO analysis, or data integration.
Overview
This actor crawls specified web pages and parses all LD+JSON script tags (application/ld+json). It outputs the raw schema data in a structured JSON format, making it usable for analysis, monitoring, or feeding into other systems.
Key Features
- Extracts All Schema Types: Captures any LD+JSON data, including but not limited to
Article,Product,Organization,BreadcrumbList, andFAQPage. - Handles Dynamic Content: Uses a headless browser (Puppeteer) to execute JavaScript, ensuring schemas rendered client-side are captured.
- Configurable Crawling: Set maximum crawl depth and pages to control the scope of your extraction.
- Proxy Support: Built-in proxy rotation to help avoid blocks during larger scraping jobs.
- Open Source: The code is publicly available for inspection and modification.
How to Use
Run the actor on the Apify platform. You can start it via the Apify Console, using the Apify API, or integrate it into workflows with other Apify actors.
Basic Input Configuration:
Configure the actor run by providing a JSON object with the following key parameters:
{
"startUrls": [
{ "url": "https://example.com/page-with-schema" }
],
"maxDepth": 1,
"maxPages": 10
}
Input/Output
Input (Run Configuration):
* startUrls (Required): An array of one or more URLs to start scraping from.
* maxDepth: How many links deep to follow from the start URLs (0 = only start URLs). Default is 1.
* maxPages: Maximum number of pages to scrape. Default is 1000.
Output (Dataset Items):
Each item in the output dataset represents a scraped page and contains:
* url: The source page URL.
* schemas: An array of objects, each containing the parsed JSON-LD data found on that page.
* metadata: Information like the HTTP status code and request/response details.
Example Output Item:
{
"url": "https://example.com/product",
"schemas": [
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Example Widget",
"description": "A great widget."
}
]
}
Find the actor and its details here: https://apify.com/your-actor-page?fpr=python_automation
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try LD+JSON Schema scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- pocesar
- Pricing
- Paid
- Total Runs
- 91,151
- Active Users
- 400
Related Actors
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Linkedin Profile Details Scraper + EMAIL (No Cookies Required)
by apimaestro
Twitter (X.com) Scraper Unlimited: No Limits
by apidojo
Content Checker
by jakubbalada
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support