HuggingFaceTP

HuggingFaceTP

by aligned_tripod

Scrapes trending research papers from HuggingFace, capturing each paper’s title, description, and URL. The scraper collects data from the listing page...

28 runs
2 users
Try This Actor

Opens on Apify.com

About HuggingFaceTP

Scrapes trending research papers from HuggingFace, capturing each paper’s title, description, and URL. The scraper collects data from the listing page and visits individual paper pages for full abstracts, providing a structured dataset of the latest AI research.

What does this actor do?

HuggingFaceTP is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

HuggingFace Trending Papers Scraper A lightweight and fast web scraper built on Apify that extracts trending AI research papers from the HuggingFace Papers Trending page. It collects essential research details by scraping both the listing page and individual paper pages for complete data. ## 🚀 Features - ✅ Scrapes trending AI/ML research papers from HuggingFace - ✅ Extracts paper titles, authors, abstracts, and publication dates - ✅ Collects paper URLs and direct links to research papers - ✅ Fast and efficient scraping with Playwright - ✅ Easy to use via Apify Console - ✅ Exports data in JSON, CSV, or Excel format - ✅ Configurable number of papers to scrape ## 📊 Data Extracted The scraper collects the following information for each paper: | Field | Description | |-------|-------------| | Paper Title | Full title of the research paper | | Authors | List of paper authors | | Abstract | Paper abstract/summary | | Publication Date | When the paper was published | | Paper URL | Link to the HuggingFace paper page | | ArXiv URL | Direct link to the paper on ArXiv (if available) | | Upvotes | Number of upvotes on HuggingFace | | Comments | Number of comments/discussions | | Scraped At | Timestamp when data was collected | ## 🛠️ How to Use ### Option 1: Using Apify Console (No Coding Required) 1. Create an Apify Account - Go to apify.com and sign up for free 2. Import This Actor - Click on ActorsCreate new - Choose this actor from the store or import via GitHub 3. Configure Input - Set Max Papers (default: 50) - Optionally adjust other settings 4. Run the Actor - Click the Start button - Wait for the scraper to complete (usually 1-3 minutes) 5. Download Results - Go to Dataset tab - Click Export and choose your format (CSV, JSON, Excel) ### Option 2: Using Apify API javascript const ApifyClient = require('apify-client'); const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN', }); const input = { maxPapers: 30, }; const run = await client.actor('YOUR_ACTOR_ID').call(input); const { items } = await client.dataset(run.defaultDatasetId).listItems(); console.log(items); ### Option 3: Scheduled Runs Set up automatic daily/weekly scraping: 1. Go to Schedules in Apify Console 2. Click Create new 3. Select this actor 4. Choose frequency (daily, weekly, etc.) 5. Save and activate ## ⚙️ Configuration Options ### Input Parameters json { "maxPapers": 50, "startUrls": [ { "url": "https://huggingface.co/papers" } ], "proxyConfiguration": { "useApifyProxy": true } } | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | maxPapers | Number | 50 | Maximum number of papers to scrape | | startUrls | Array | HuggingFace Papers | URLs to start scraping from | | proxyConfiguration | Object | Apify Proxy | Proxy settings to avoid blocking | ## 📦 Output Format ### JSON Example json [ { "Paper Title": "Attention Is All You Need", "Authors": "Vaswani et al.", "Abstract": "The dominant sequence transduction models...", "Publication Date": "2023-12-01", "Paper URL": "https://huggingface.co/papers/1706.03762", "ArXiv URL": "https://arxiv.org/abs/1706.03762", "Upvotes": 1250, "Comments": 45, "Scraped At": "2025-12-06T09:45:00.000Z" } ] ### CSV Example csv Paper Title,Authors,Abstract,Publication Date,Paper URL,ArXiv URL,Upvotes,Comments,Scraped At "Attention Is All You Need","Vaswani et al.","The dominant sequence...","2023-12-01","https://huggingface.co/papers/1706.03762","https://arxiv.org/abs/1706.03762",1250,45,"2025-12-06T09:45:00.000Z" ## 🔧 Technical Details ### Built With - Apify SDK - Actor framework - Crawlee - Web crawling and scraping library - Playwright - Headless browser automation - Cheerio - HTML parsing ### Requirements - Node.js 18+ - Apify account (free tier available) ## 📈 Use Cases - Research Tracking: Stay updated with trending AI research - Content Curation: Aggregate papers for newsletters or blogs - Academic Monitoring: Track specific research areas - Data Analysis: Analyze trends in AI/ML research - Literature Review: Collect papers for research projects ## 🚨 Rate Limiting & Best Practices - The scraper uses Apify proxy by default to avoid blocking - Respects HuggingFace's robots.txt - Implements reasonable delays between requests - Recommended: Run no more than once per hour ## 🐛 Troubleshooting ### No Data Scraped - Check if HuggingFace changed their page structure - Verify proxy settings are enabled - Increase wait time in settings ### Partial Data - Some papers may not have all fields available - The scraper handles missing data gracefully ### Actor Fails - Check the logs in the Run tab - Ensure you have sufficient Apify credits - Try reducing maxPapers value ## 📝 Example Use Case: Daily AI Research Digest 1. Schedule the actor to run daily at 9 AM 2. Connect to Zapier/Make to send results to: - Notion database - Google Sheets - Slack channel - Email digest 3. Filter papers by keywords in your own processing pipeline ## 🤝 Contributing Found a bug or want to suggest improvements? - Open an issue in the repository - Submit a pull request - Contact support via Apify Console ## 📄 License This actor is provided as-is under the MIT License. ## 🔗 Links - HuggingFace Papers - Apify Platform - Actor Documentation - Support ## 💡 Tips - Combine with other scrapers: Use alongside arXiv or Google Scholar scrapers for comprehensive coverage - Set up alerts: Use Apify webhooks to get notified when new papers are found - Custom filtering: Process the output with your own scripts to filter by topics/authors - Data enrichment: Combine with citation APIs to get paper impact metrics --- Note: This scraper is for educational and research purposes. Always respect website terms of service and rate limits. Use responsibly! 🎓 Last Updated: December 2025

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try HuggingFaceTP now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
aligned_tripod
Pricing
Paid
Total Runs
28
Active Users
2
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support