Dataset to HuggingFace

Dataset to HuggingFace

by flamboyant_leaf

Effortlessly move your Apify datasets to Hugging Face. Automate the bridge from web scraping to machine learning, enabling direct access to models, collaboration tools, and data versioning.

486 runs
6 users
Try This Actor

Opens on Apify.com

About Dataset to HuggingFace

Stuck moving your scraped data from Apify into a proper machine learning environment? This actor solves that exact headache. It's a straightforward bridge that takes the datasets you've built in Apify and pushes them directly into Hugging Face. Once your data is over there, the real work begins—you can immediately tap into their massive library of pre-trained models, use their powerful compute resources, and collaborate with your team using their versioning tools. You get to control the transfer, setting limits on how much data moves in one go, which is perfect for managing large projects. I use this to skip the manual export-upload shuffle, which saves a ton of time and keeps my datasets organized. If you're a data scientist or researcher who scrapes data for models, this actor basically automates the boring part, letting you focus on training and experimentation faster. It turns your web-scraped datasets into active, ready-to-use assets on one of the best ML platforms out there.

What does this actor do?

Dataset to HuggingFace is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Dataset to HuggingFace

Overview

This actor transfers data from an Apify dataset to a Hugging Face dataset. It bridges Apify's web scraping ecosystem with Hugging Face's machine learning platform, enabling you to move scraped data directly into an environment with access to thousands of pre-trained models, collaborative tools, and streamlined ML pipelines.

The actor preserves the dataset ID, so your Hugging Face dataset identifier matches your original Apify dataset ID, making it easy to track data across platforms.

Key Features

  • Direct Data Transfer: Moves data from any Apify dataset to a new or existing Hugging Face dataset.
  • Transfer Limits: Control the volume of data transferred using a configurable maxItems parameter.
  • Seamless Integration: Designed to chain directly after web scraping actors, using the default dataset ID as input to automate the workflow from data collection to ML-ready dataset.
  • Detailed Logging: Provides transparency and debugging information throughout the transfer process.

How to Use

  1. Configure Input: Set the required parameters in the actor's input.
  2. Run the Actor: Start the actor on the Apify platform.
  3. Access Data: Once finished, your data will be available in the specified Hugging Face dataset.

Input

Configure the actor using the following input fields.

{
  "apifyDatasetId": "your-apify-dataset-id",
  "huggingFaceDatasetName": "your-huggingface-dataset-name",
  "huggingFaceToken": "your-huggingface-api-token",
  "maxItems": 1000
}
  • apifyDatasetId (required): The ID of the Apify dataset to transfer data from.
  • huggingFaceDatasetName (required): The name for the target dataset on Hugging Face.
  • huggingFaceToken (required): Your Hugging Face API token for authentication.
  • maxItems (optional): The maximum number of items to transfer. Use 0 to transfer all items.

Output

The actor's primary output is the dataset created on Hugging Face. You can find it at https://huggingface.co/datasets/{huggingFaceDatasetName}. The run log will confirm the transfer's success and item count.

Integration & Workflow

This actor is typically used in a pipeline following a web scraper. For example:
1. A web scraper actor runs and stores its results in an Apify dataset.
2. This actor takes that dataset's ID as input (apifyDatasetId).
3. It transfers the scraped data to Hugging Face, making it immediately available for machine learning tasks.

The actor can also be integrated with other services via the Apify platform, including Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, and Google Drive.

Feedback

For suggestions or issues, please create an issue on the actor's GitHub repository or contact Apify support.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Dataset to HuggingFace now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
flamboyant_leaf
Pricing
Paid
Total Runs
486
Active Users
6
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support