HuggingFaceTP
by aligned_tripod
Scrapes trending research papers from HuggingFace, capturing each paper’s title, description, and URL. The scraper collects data from the listing page...
Opens on Apify.com
About HuggingFaceTP
Scrapes trending research papers from HuggingFace, capturing each paper’s title, description, and URL. The scraper collects data from the listing page and visits individual paper pages for full abstracts, providing a structured dataset of the latest AI research.
What does this actor do?
HuggingFaceTP is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
HuggingFace Trending Papers Scraper A lightweight and fast web scraper built on Apify that extracts trending AI research papers from the HuggingFace Papers Trending page. It collects essential research details by scraping both the listing page and individual paper pages for complete data. ## 🚀 Features - ✅ Scrapes trending AI/ML research papers from HuggingFace - ✅ Extracts paper titles, authors, abstracts, and publication dates - ✅ Collects paper URLs and direct links to research papers - ✅ Fast and efficient scraping with Playwright - ✅ Easy to use via Apify Console - ✅ Exports data in JSON, CSV, or Excel format - ✅ Configurable number of papers to scrape ## 📊 Data Extracted The scraper collects the following information for each paper: | Field | Description | |-------|-------------| | Paper Title | Full title of the research paper | | Authors | List of paper authors | | Abstract | Paper abstract/summary | | Publication Date | When the paper was published | | Paper URL | Link to the HuggingFace paper page | | ArXiv URL | Direct link to the paper on ArXiv (if available) | | Upvotes | Number of upvotes on HuggingFace | | Comments | Number of comments/discussions | | Scraped At | Timestamp when data was collected | ## 🛠️ How to Use ### Option 1: Using Apify Console (No Coding Required) 1. Create an Apify Account - Go to apify.com and sign up for free 2. Import This Actor - Click on Actors → Create new - Choose this actor from the store or import via GitHub 3. Configure Input - Set Max Papers (default: 50) - Optionally adjust other settings 4. Run the Actor - Click the Start button - Wait for the scraper to complete (usually 1-3 minutes) 5. Download Results - Go to Dataset tab - Click Export and choose your format (CSV, JSON, Excel) ### Option 2: Using Apify API javascript const ApifyClient = require('apify-client'); const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN', }); const input = { maxPapers: 30, }; const run = await client.actor('YOUR_ACTOR_ID').call(input); const { items } = await client.dataset(run.defaultDatasetId).listItems(); console.log(items); ### Option 3: Scheduled Runs Set up automatic daily/weekly scraping: 1. Go to Schedules in Apify Console 2. Click Create new 3. Select this actor 4. Choose frequency (daily, weekly, etc.) 5. Save and activate ## ⚙️ Configuration Options ### Input Parameters json { "maxPapers": 50, "startUrls": [ { "url": "https://huggingface.co/papers" } ], "proxyConfiguration": { "useApifyProxy": true } } | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | maxPapers | Number | 50 | Maximum number of papers to scrape | | startUrls | Array | HuggingFace Papers | URLs to start scraping from | | proxyConfiguration | Object | Apify Proxy | Proxy settings to avoid blocking | ## 📦 Output Format ### JSON Example json [ { "Paper Title": "Attention Is All You Need", "Authors": "Vaswani et al.", "Abstract": "The dominant sequence transduction models...", "Publication Date": "2023-12-01", "Paper URL": "https://huggingface.co/papers/1706.03762", "ArXiv URL": "https://arxiv.org/abs/1706.03762", "Upvotes": 1250, "Comments": 45, "Scraped At": "2025-12-06T09:45:00.000Z" } ] ### CSV Example csv Paper Title,Authors,Abstract,Publication Date,Paper URL,ArXiv URL,Upvotes,Comments,Scraped At "Attention Is All You Need","Vaswani et al.","The dominant sequence...","2023-12-01","https://huggingface.co/papers/1706.03762","https://arxiv.org/abs/1706.03762",1250,45,"2025-12-06T09:45:00.000Z" ## 🔧 Technical Details ### Built With - Apify SDK - Actor framework - Crawlee - Web crawling and scraping library - Playwright - Headless browser automation - Cheerio - HTML parsing ### Requirements - Node.js 18+ - Apify account (free tier available) ## 📈 Use Cases - Research Tracking: Stay updated with trending AI research - Content Curation: Aggregate papers for newsletters or blogs - Academic Monitoring: Track specific research areas - Data Analysis: Analyze trends in AI/ML research - Literature Review: Collect papers for research projects ## 🚨 Rate Limiting & Best Practices - The scraper uses Apify proxy by default to avoid blocking - Respects HuggingFace's robots.txt - Implements reasonable delays between requests - Recommended: Run no more than once per hour ## 🐛 Troubleshooting ### No Data Scraped - Check if HuggingFace changed their page structure - Verify proxy settings are enabled - Increase wait time in settings ### Partial Data - Some papers may not have all fields available - The scraper handles missing data gracefully ### Actor Fails - Check the logs in the Run tab - Ensure you have sufficient Apify credits - Try reducing maxPapers value ## 📝 Example Use Case: Daily AI Research Digest 1. Schedule the actor to run daily at 9 AM 2. Connect to Zapier/Make to send results to: - Notion database - Google Sheets - Slack channel - Email digest 3. Filter papers by keywords in your own processing pipeline ## 🤝 Contributing Found a bug or want to suggest improvements? - Open an issue in the repository - Submit a pull request - Contact support via Apify Console ## 📄 License This actor is provided as-is under the MIT License. ## 🔗 Links - HuggingFace Papers - Apify Platform - Actor Documentation - Support ## 💡 Tips - Combine with other scrapers: Use alongside arXiv or Google Scholar scrapers for comprehensive coverage - Set up alerts: Use Apify webhooks to get notified when new papers are found - Custom filtering: Process the output with your own scripts to filter by topics/authors - Data enrichment: Combine with citation APIs to get paper impact metrics --- Note: This scraper is for educational and research purposes. Always respect website terms of service and rate limits. Use responsibly! 🎓 Last Updated: December 2025
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try HuggingFaceTP now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- aligned_tripod
- Pricing
- Paid
- Total Runs
- 28
- Active Users
- 2
Related Actors
Smart Article Extractor
by lukaskrivka
Google Search
by devisty
Twitter Tweets Scraper
by gentle_cloud
Twitter Profile
by danek
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support