Hugging Face Model Scraper

Hugging Face Model Scraper

by parseforge

Collect models from Hugging Face Hub via public API endpoints. Get metadata including author, downloads, likes, lastModified, task, library, license, ...

62 runs
5 users
Try This Actor

Opens on Apify.com

About Hugging Face Model Scraper

Collect models from Hugging Face Hub via public API endpoints. Get metadata including author, downloads, likes, lastModified, task, library, license, tags and filenames.

What does this actor do?

Hugging Face Model Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

πŸ€– Hugging Face Model Scraper Collect models from Hugging Face Hub via public API endpoints. Get metadata including author, downloads, likes, lastModified, task, library, license, tags and filenames. Built for analysts, researchers, and developers who need fast insights with no browser automation. ## 🎯 What does it collect? βœ… Model id, name, URL βœ… Author βœ… Downloads, likes βœ… Last modified, createdAt βœ… Task (pipeline tag), library βœ… License, tags ## How to use [YouTube video embed or link] Example run: query β€œbert”, 20 items, sorted by downloads. ## Input Fields supported: - query string β€” free text search - task string β€” e.g., text-classification, image-classification, text-generation - library string β€” e.g., transformers, diffusers, timm - license string β€” e.g., apache-2.0, mit, cc-by-4.0 - language string β€” e.g., en, zh, multi - sort enum β€” downloads | likes | lastModified | trending - direction enum β€” asc | desc - maxItems integer β€” max models to return (up to 1,000,000). Free users: Limited to 100. Paid users: Optional, max 1,000,000. Prefill value: 10. Here's what the filled-out input schema looks like: Input Configuration And here it is written in JSON: json { "query": "bert", "sort": "downloads", "direction": "desc", "maxItems": 100 } Pro Tip: Combine multiple filters to narrow down results. For example, search for "bert" models with task "text-classification" and library "transformers" for highly targeted results. ## Output After the Actor finishes its run, you'll get a dataset with the output. The length of the dataset depends on the amount of results you've set. You can download those results as CSV, Excel, or JSON. Here's an example of scraped Hugging Face model data: Output Example json { "imageUrl": "https://huggingface.co/google-bert/avatar", "id": "google-bert/bert-base-uncased", "name": "google-bert/bert-base-uncased", "url": "https://huggingface.co/google-bert/bert-base-uncased", "author": "google-bert", "downloads": 54018364, "likes": 2423, "private": false, "gated": false, "disabled": false, "sha": "86b5e0934494bd15c9632b12f734a8a67f723594", "lastModified": "2024-02-19T11:06:12.000Z", "createdAt": "2022-03-02T23:29:04.000Z", "task": "fill-mask", "library": "transformers", "license": "apache-2.0", "language": ["en"], "datasets": ["bookcorpus", "wikipedia"], "tags": ["exbert"], "files": [ ".gitattributes", "LICENSE", "README.md", "config.json", "model.safetensors", "pytorch_model.bin", "tokenizer.json", "tokenizer_config.json", "vocab.txt" ] } What You Get: Complete model metadata including popularity metrics (downloads, likes), technical details (task, library, license), training information (datasets, language), and available model files. Download Options: CSV, Excel, or JSON formats for easy analysis in your business tools ## ⚑ Why choose this scraper? βœ… API-first, fast: Uses Hugging Face public API endpoints (no browser) βœ… Flexible filtering: query, task, library, license, language, sorting βœ… Comprehensive data: Get downloads, likes, tasks, licenses, files, and more βœ… User-Friendly: No coding neededβ€”just set filters and go ⏰ Time Savings: Save hours compared to manual model research and tracking πŸ’° Cost Efficiency: Fraction of the cost of maintaining custom tracking infrastructure ## πŸ”§ How to use 1. πŸ“ Sign Up: Create a free account w/ $5 credit (takes 2 minutes) 2. πŸ” Find the Scraper: Visit the Hugging Face Model Scraper page 3. βš™οΈ Set Input: Add your filters and max items 4. πŸš€ Run It: Click "Start" and let it collect your data 5. πŸ“₯ Download Data: Get your results in the "Dataset" tab as CSV, Excel, or JSON ⏱️ Total Time: 5 minutes setup, 10-30 minutes for data collection 🎯 No Technical Skills Required: Everything is point-and-click ## Business Use Cases AI/ML Researchers: - Track trending models in your research area - Monitor model performance metrics (downloads, likes) - Identify popular architectures and libraries - Discover datasets used for training ML Engineers: - Find production-ready models for specific tasks - Compare models by popularity and recency - Identify licensing requirements before deployment - Track model updates and new releases Data Scientists: - Build comprehensive model catalogs - Analyze AI/ML trends and adoption patterns - Identify suitable pre-trained models for projects - Monitor emerging techniques and libraries Product Managers: - Track competitive AI/ML landscape - Monitor adoption of different model types - Identify popular solutions for product features - Support AI strategy with market intelligence ## Integrate with any app and automate your workflow Hugging Face Model Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform. These includes: - Make - Zapier - Slack - Airbyte - GitHub - Google Drive - and much more. Alternatively, you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever a run successfully finishes. ## Using with the Apify API For advanced users who want to automate this process, you can control the scraper programmatically with the Apify API. This allows you to schedule regular data collection and integrate with your existing business tools. - Node.js: Install the apify-client NPM package - Python: Use the apify-client PyPI package - See the Apify API reference for full details ## πŸ’° Pricing - Start price: $0.005 per run - Price per 1,000 results: $5.00 (i.e., $0.005 per result) Free users are automatically limited to 100 items. Paid users can process up to 1,000,000 items, and if not defined, maxItems is unlimited. ## Frequently Asked Questions Q: How accurate is the data? A: We collect data directly from Hugging Face's public API in real-time, ensuring the most up-to-date and accurate information available. Q: Can I schedule regular runs? A: Yes! Use the Apify scheduler or API to schedule daily, weekly, or monthly runs automatically. Perfect for tracking model trends over time. Q: What's the rate limit? A: We respect Hugging Face's API limits. The scraper handles rate limiting automatically. Q: Can I get model descriptions and READMEs? A: Currently, the scraper focuses on metadata. For full READMEs, you can use the model URLs provided in the output. Q: What if I need help? A: Our support team is available. Contact us through the Apify platform. Q: Is my data secure? A: Absolutely. All data is encrypted in transit and at rest. We never share your data with third parties. ## πŸ”— Recommended Actors Looking for more data collection tools? Check out these related actors: | Actor | Description | Link | |-------|-------------|------| | Hubspot Marketplace Scraper | Extracts business app data from HubSpot marketplace | https://apify.com/parseforge/hubspot-marketplace-scraper | | PR Newswire Scraper | Extracts press release and news content from PR Newswire | https://apify.com/parseforge/pr-newswire-scraper | | Smart Apify Actor Scraper (+70 Fields + Actor Quality Metrics) | Collects comprehensive actor data from Apify store | https://apify.com/parseforge/smart-apify-actor-scraper | | AWS Marketplace Scraper | Extracts business app data from AWS marketplace | https://apify.com/parseforge/aws-marketplace-scraper | | Stripe App Marketplace Scraper | Collects app data from Stripe marketplace | https://apify.com/parseforge/stripe-marketplace-scraper | Pro Tip: πŸ’‘ Browse our complete collection of data collection actors to find the perfect tool for your business needs. Need Help? Our support team is here to help you get the most out of this tool. --- > ⚠️ Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Hugging Face or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Hugging Face Model Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
parseforge
Pricing
Paid
Total Runs
62
Active Users
5
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support