Dataset(s) To Schema

Dataset(s) To Schema

by zuzka

Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.

140 runs
2 users
Try This Actor

Opens on Apify.com

About Dataset(s) To Schema

Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.

What does this actor do?

Dataset(s) To Schema is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Dataset to Schema Generates a JSON Schema from one or more datasets on Apify. The actor scans dataset items, detects data types for each field (including merging multiple types), and outputs the resulting schema: * Saves it to the Key‑Value Store under the key SCHEMA (as application/json), * Also pushes the same schema as an item to the run’s output dataset for convenient viewing or sharing. > Use case: validating scraper outputs, generating OpenAPI/validators, or quickly checking data consistency across multiple datasets. --- ## Input (input schema) json { "title": "Generate schema from datasets", "type": "object", "schemaVersion": 1, "properties": { "datasetIds": { "title": "Dataset IDs", "type": "array", "description": "IDs of the datasets for which to generate a schema", "editor": "stringList" } }, "required": ["datasetIds"] } ### Fields * datasetIds (array, required) — list of Apify dataset IDs to include in schema generation. You can provide one or multiple IDs; the actor iterates through them and merges schemas together. --- ## Output The actor produces the same schema in two places: 1. Key‑Value Store: key SCHEMA – complete JSON Schema file (e.g., schema.json). 2. Output dataset: a single item containing the full schema (for quick preview in the console). ### Example output schema (truncated) json { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "title": { "type": ["string", "null"] }, "price": { "type": ["number", "string"] }, "inStock": { "type": "boolean" }, "images": { "type": "array", "items": { "type": "string" } } }, "additionalProperties": true } > Note: The actor merges multiple observed types into union types (e.g., "type": ["number", "string"]) when data varies. --- ## How It Works * Reads datasetIds from the input. * Iterates through each dataset and detects field types: number, string, boolean, object, array (unifying differing values into union types if needed). * Merges all detected fields into a single schema covering all datasets. * Saves the final schema to the KV Store (SCHEMA) and pushes it to the output dataset. * If a dataset exceeds internal iteration limits (≈1 M items), logs a warning that the schema may be incomplete but still completes the run. --- ## Quick Start on Apify 1. Create a run of the actor in the Apify Console. 2. Provide input: json { "datasetIds": ["abc123", "def456"] } 3. Run it. After completion, open Storage → Key‑Value Store and download SCHEMA. Alternatively, open the output dataset to view the schema item. ## Limitations & Edge Cases * Large datasets (> ~1 M items): the actor logs a warning (“Schema might not be perfect.”) and continues. For higher accuracy, generate a schema from a smaller sample or pre‑aggregate data. * Heterogeneous data: if fields vary widely, expect broader union types — this is intentional so the schema reflects observed variability.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Dataset(s) To Schema now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
zuzka
Pricing
Paid
Total Runs
140
Active Users
2
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support