Github Trending Scraper
by mohamedgb00714
github trending scraper
Opens on Apify.com
About Github Trending Scraper
github trending scraper
What does this actor do?
Github Trending Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
GitHub Trending Repositories Scraper A production-ready Apify Actor that scrapes trending repositories from GitHub with comprehensive filtering options. ## Features - ✅ Server-side rendered scraping - Uses CheerioCrawler for fast, efficient scraping (no browser overhead) - 🔍 Filter by programming language - JavaScript, Python, Go, Rust, TypeScript, etc. - 📅 Filter by date range - Daily, Weekly, or Monthly trending repos - 🌍 Filter by spoken language - English, Chinese, Spanish, etc. - ⚡ Configurable limits - Set maximum number of repositories to scrape - 🔒 Proxy support - Built-in Apify Proxy support or custom proxies - 📊 Rich dataset output - Complete repository data with stars, forks, contributors, and more ## Input Parameters The Actor accepts the following input parameters: | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | language | String | "" (all) | Filter by programming language (e.g., javascript, python, go) | | dateRange | String | "daily" | Time period: "daily", "weekly", or "monthly" | | spokenLanguage | String | "" (all) | Filter by natural language (e.g., en, zh, es) | | maxItems | Integer | 25 | Maximum number of repositories to scrape (0 = unlimited) | | proxyConfiguration | Object | { useApifyProxy: false } | Proxy settings | ### Example Input json { "language": "python", "dateRange": "weekly", "spokenLanguage": "", "maxItems": 50, "proxyConfiguration": { "useApifyProxy": false } } ## Output The Actor outputs a dataset with the following fields for each repository: typescript { owner: string; // Repository owner username repositoryName: string; // Repository name fullName: string; // Full name (owner/repo) url: string; // GitHub repository URL description: string; // Repository description language: string; // Primary programming language stars: number; // Total stars count forks: number; // Total forks count starsToday: number; // Stars gained in this period builtBy: Array<{ // Top contributors username: string; profileUrl: string; }>; scrapedAt: string; // ISO timestamp } ### Example Output json { "owner": "microsoft", "repositoryName": "vscode", "fullName": "microsoft/vscode", "url": "https://github.com/microsoft/vscode", "description": "Visual Studio Code", "language": "TypeScript", "stars": 162000, "forks": 28000, "starsToday": 150, "builtBy": [ { "username": "bpasero", "profileUrl": "https://github.com/bpasero" } ], "scrapedAt": "2024-11-26T10:30:00.000Z" } ## How It Works 1. URL Construction: Builds the GitHub trending URL based on your filters 2. Server-Side Scraping: Uses CheerioCrawler (fast HTTP requests, no browser) 3. Data Extraction: Parses HTML to extract repository data 4. Dataset Storage: Pushes structured data to Apify Dataset ## Local Development ### Prerequisites - Node.js 18+ - npm or yarn ### Installation bash npm install ### Running Locally IMPORTANT: Always use apify run to run the Actor locally (NOT npm start): bash # Run with default input from storage/key_value_stores/default/INPUT.json apify run # Run with custom input apify run -i '{"language":"javascript","dateRange":"weekly","maxItems":10}' # Run with input from file apify run --input-file my-input.json ### Testing Different Scenarios bash # Get top 10 trending Python repos today apify run -i '{"language":"python","dateRange":"daily","maxItems":10}' # Get weekly trending JavaScript repos apify run -i '{"language":"javascript","dateRange":"weekly","maxItems":25}' # Get monthly trending repos (all languages) apify run -i '{"dateRange":"monthly","maxItems":50}' # Get trending repos in Chinese apify run -i '{"spokenLanguage":"zh","maxItems":20}' ## Deployment to Apify Platform ### Option 1: Link Git Repository 1. Go to Actor creation page 2. Click on Link Git Repository 3. Connect your GitHub repository ### Option 2: Push from Local Machine bash # Login to Apify (requires API token) apify login # Deploy Actor to Apify Platform apify push ## Performance - Speed: ~2-5 seconds per run (server-side rendering) - Crawler Type: CheerioCrawler (HTTP-based, no browser overhead) - Memory: ~256MB typical usage - Concurrency: Single request (trending page is one page) ## Use Cases - 📈 Trend Analysis: Track trending technologies and languages - 🔍 Repository Discovery: Find popular new projects - 📊 Data Collection: Build datasets for research - 🤖 Automation: Schedule daily/weekly trending reports - 📧 Notifications: Get alerts for trending repos in your language ## Limitations - GitHub may rate-limit requests without proxy - Trending page shows ~25 repositories per page - No pagination (trending page is a single page) ## Troubleshooting ### No repositories scraped - Check if GitHub changed their HTML structure - Enable Apify Proxy if you're being rate-limited - Verify your language/date range filters are valid ### Rate limiting json { "proxyConfiguration": { "useApifyProxy": true, "groups": ["RESIDENTIAL"] } } ## Resources - GitHub Trending Page - Apify Platform Documentation - Crawlee Documentation - Apify SDK Documentation ## License ISC ## Development Tools This Actor was built using the Apify AutoPlans VS Code Extension - an AI-powered development assistant for building Apify Actors with intelligent code generation, testing, and deployment capabilities. ### Build Your Own Actor Want to create your own Apify Actor with AI assistance? Install the extension: 1. Open VS Code 2. Search for "Apify AutoPlans" in the Extensions marketplace 3. Install and start building production-ready scrapers with AI ## Author Built with ❤️ using Apify SDK, Crawlee, and Apify AutoPlans VS Code Extension - Join our developer community on Discord
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Github Trending Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- mohamedgb00714
- Pricing
- Paid
- Total Runs
- 14
- Active Users
- 2
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Legacy PhantomJS Crawler
by apify
Content Checker
by jakubbalada
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support