Reddit Scraper
by benthepythondev
Extract Reddit posts, comments & user data in AI-ready markdown format. No API keys needed! 25% cheaper than competitors. Perfect for AI training, sen...
Opens on Apify.com
About Reddit Scraper
Extract Reddit posts, comments & user data in AI-ready markdown format. No API keys needed! 25% cheaper than competitors. Perfect for AI training, sentiment analysis & market research. Includes bulk comment scraping with progress tracking.
What does this actor do?
Reddit Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Reddit Scraper - Fast & AI-Ready Data Extraction Extract Reddit posts, comments, and user data in markdown format perfect for AI training, market research, and sentiment analysis. No API keys needed! ## What can Reddit Scraper extract? This Reddit Scraper can extract comprehensive data from Reddit including: - Posts: Titles, content (text/markdown/HTML), scores, comments count, awards, timestamps - Comments: Nested comment threads with full hierarchy, scores, and timestamps - User Data: Post history, karma scores, account information - Subreddit Info: Community statistics, descriptions, member counts - Search Results: Find posts across Reddit or within specific communities - Images & Media: Extract image URLs, thumbnails, and media metadata - Engagement Metrics: Upvote ratios, comment counts, award counts - AI-Ready Output: Token counts and markdown formatting for LLM training ## Why choose Reddit Scraper? ✅ 25% Cheaper - Only $1.50 per 1,000 results vs $2.00+ from competitors ✅ Faster - Uses Reddit's JSON API (no heavy browser needed) ✅ Bulk Comment Loading - Efficient scraping with up to 500 comments per request ✅ AI-Optimized - Markdown output with token counts for ML training ✅ No API Keys - Works without Reddit API authentication ✅ Progress Tracking - Real-time updates on scraping progress ✅ Easy to Use - Simple input configuration, no coding required ## How do I use Reddit Scraper? ### 1. Create a free Apify account Sign up at apify.com - you get $5 free credit (enough for 3,300+ posts!) ### 2. Start the Actor Visit the Reddit Scraper page and click "Try for free" ### 3. Configure your scrape Choose what to scrape: Subreddit Posts: json { "mode": "subreddit", "subreddit": "ArtificialInteligence", "sort": "hot", "maxPosts": 100 } Single Post + Comments: json { "mode": "post", "postUrl": "https://www.reddit.com/r/python/comments/abc123/example/", "maxComments": 500 } User Posts: json { "mode": "user", "username": "example_user", "maxPosts": 100 } Search Reddit: json { "mode": "search", "searchQuery": "machine learning", "searchSubreddit": "python", "maxPosts": 200 } ### 4. Download your data Export in JSON, CSV, Excel, XML, or HTML format ## Input Parameters | Parameter | Type | Description | |-----------|------|-------------| | mode | string | Scraping mode: subreddit, post, user, or search | | subreddit | string | Subreddit name (e.g., "python") | | postUrl | string | Full URL of post to scrape | | username | string | Reddit username to scrape | | searchQuery | string | Search query | | sort | string | Sort order: hot, new, top, rising, controversial | | timeFilter | string | Time filter: hour, day, week, month, year, all | | maxPosts | integer | Maximum posts to scrape (0 = unlimited) | | maxComments | integer | Maximum comments per post (0 = unlimited, applies to both post mode and subreddit mode with includeComments enabled) | | includeComments | boolean | Include comments in subreddit mode (enables bulk comment scraping with progress tracking) | | sinceDate | string | Only posts after this date (YYYY-MM-DD) | | outputFormat | string | Content format: markdown, html, or text | | includeImages | boolean | Extract image URLs | | delaySeconds | number | Delay between requests (default: 1.0) | ## Output Example json { "id": "abc123", "title": "How I built an AI agent that scrapes Reddit", "url": "https://reddit.com/r/artificial/comments/abc123/", "selftext_markdown": "Here's my complete guide...", "author": "ai_developer", "subreddit": "artificial", "score": 1250, "upvote_ratio": 0.97, "num_comments": 89, "created_utc": "2025-01-15T10:30:00Z", "word_count": 850, "token_count": 1200, "images": [ { "url": "https://i.redd.it/example.jpg", "width": 1200, "height": 800 } ] } ## Use Cases ### 1. AI Training Data 🤖 Reddit is goldmine for LLM training: - Real human conversations and discussions - Expert Q&A across 100K+ communities - Diverse topics and writing styles - Already in markdown format for easy processing Example: Train a customer service chatbot on 50K support-related Reddit posts ### 2. Market Research 📊 Understand what people really think: - Track brand mentions and sentiment - Monitor competitor discussions - Identify trending topics and pain points - Analyze customer feedback in real-time Example: Scrape r/SaaS to understand startup challenges and opportunities ### 3. Content Research ✍️ Find ideas and inspiration: - Discover viral content patterns - Identify popular discussion topics - Research audience questions and pain points - Find engaging headlines and angles Example: Scrape top posts from r/Entrepreneur for blog content ideas ### 4. Sentiment Analysis 😊😡 Analyze public opinion at scale: - Track sentiment on products/brands - Monitor crisis situations - Understand community mood shifts - Identify influencers and thought leaders Example: Analyze 10K comments about a new product launch ### 5. Academic Research 🎓 Study online communities: - Social network analysis - Language and communication patterns - Community dynamics and moderation - Misinformation spread patterns Example: Research how scientific information spreads on Reddit ### 6. Competitive Intelligence 🔍 Stay ahead of competitors: - Monitor competitor mentions - Track industry discussions - Identify emerging trends early - Understand customer pain points Example: Track all mentions of competitors in your industry subreddits ## How much will it cost to scrape Reddit data? Reddit Scraper uses pay-per-result pricing - you only pay for the data you extract. Pricing: $1.50 per 1,000 results ### Cost Examples: | Posts Scraped | Cost | What You Get | |---------------|------|--------------| | 100 posts | $0.15 | Small subreddit sample | | 1,000 posts | $1.50 | Medium dataset | | 10,000 posts | $15.00 | Large research dataset | | 100,000 posts | $150.00 | Enterprise AI training data | ### Free Tier: With Apify's free plan ($5 credit), you get: - ~3,300 posts FREE to try the Actor - Perfect for testing and small projects ### ROI Calculation: Manual Scraping: - Time: ~2 minutes per post manually - 1,000 posts = 33 hours of work - At $25/hour = $825 cost Reddit Scraper: - Time: ~2 minutes total (automated) - 1,000 posts = $1.50 - Savings: $823.50 (99.8% cost reduction!) ## Pro Tips ### Optimize for Speed - Use hot or new sort - they're faster than top - Set reasonable maxPosts limits - Use includeComments: false unless you need them ### Get Quality Data - Use markdown output format for AI training - Filter by timeFilter to get recent content - Use sinceDate for incremental scraping - Sort by top + week for high-quality posts - Enable includeComments for complete conversation data ### Efficient Comment Scraping - Set maxComments to limit comments per post (default: 100) - Uses bulk loading (up to 500 comments per request) - Includes progress tracking showing scraped/failed posts - Nested comments are preserved with full hierarchy - Failed posts are logged but don't stop the scraping ### Avoid Rate Limits - Keep delaySeconds at 1.0 or higher - Scrape during off-peak hours (US nighttime) - Don't scrape the same subreddit repeatedly ### Save Money - Set maxPosts to avoid over-scraping - Use search mode for targeted data - Scrape only what you need ## Technical Details ### How It Works Reddit Scraper uses Reddit's official JSON API (not web scraping): 1. Converts Reddit URLs to JSON API endpoints 2. Fetches data using HTTP requests (no browser) 3. Parses and structures data into clean models 4. Converts HTML to markdown for AI compatibility 5. Counts tokens for LLM training estimation ### Data Quality - ✅ Real-time data (not cached) - ✅ Complete post and comment threads - ✅ Nested comment structure preserved - ✅ All metadata included (scores, timestamps, awards) - ✅ Markdown formatting cleaned and optimized ### Performance - Speed: ~100-200 posts per minute - Reliability: 99%+ success rate - Scale: Tested with 100K+ posts ### Limitations - Cannot access deleted/removed posts - Cannot scrape private subreddits - Reddit's API has 100 posts/page limit (we handle pagination) - Comments are limited by Reddit's API (usually ~500 top-level comments per post) ## Comparison: Reddit Scraper vs Alternatives | Feature | Reddit Scraper | Manual Scraping | Reddit API | Other Scrapers | |---------|----------------|-----------------|------------|----------------| | Price | $1.50/1K | $825/1K | $12K+/50M | $2-5/1K | | No API Key | ✅ | N/A | ❌ | Varies | | Markdown Output | ✅ | ❌ | ❌ | ❌ | | Token Counts | ✅ | ❌ | ❌ | ❌ | | Speed | Fast | Slow | Fast | Varies | | Easy Setup | ✅ | ❌ | ❌ | ✅ | | Scale | Unlimited | Limited | Limited | Unlimited | ## Frequently Asked Questions ### Is this legal? Yes! Reddit Scraper only accesses publicly available data that Reddit makes available through their JSON API. We respect robots.txt and rate limits. ### Do I need a Reddit API key? No! Reddit Scraper uses Reddit's public JSON API which doesn't require authentication for public content. ### Can I scrape private subreddits? No, only public content is accessible without authentication. ### How fast is it? Approximately 100-200 posts per minute, depending on content size and settings. ### Can I scrape comments? Yes! Use mode: "post" to scrape a specific post with all its comments, or enable includeComments in subreddit mode. ### What's the maximum I can scrape? There's no hard limit, but we recommend batching large scrapes (10K+ posts) to avoid timeouts. ### Why markdown format? Markdown is perfect for AI training because it: - Preserves text structure (bold, links, lists) - Is lightweight and clean - Works great with LLMs like GPT, Claude, etc. - Easy to convert to other formats ### Can I schedule regular scrapes? Yes! Use Apify's Schedules feature to run the Actor automatically. ### How do I integrate with my application? Use Apify's API or webhooks to trigger scrapes and receive data programmatically. ### What if I hit Reddit's rate limits? Increase the delaySeconds parameter. Our default (1.0 seconds) works for most cases. ### Can I get historical data? Reddit's API only provides recent posts (usually last 1000 per subreddit). For historical data, you'll need specialized datasets. ## Support Need help? Have a feature request? - 📧 Email: contact via Apify - 🐛 Issues: Report in the Run console - 💬 Questions: Ask in the Apify community We typically respond within 24 hours! ## Related Actors Check out my other data extraction tools: - Newsletter Scraper - Scrape Substack, Beehiiv & Ghost newsletters with full content extraction More scrapers coming soon! Follow @benthepythondev for updates. --- Ready to extract Reddit data? Start scraping now → 🤖 Built with the Apify SDK | Made by benthepythondev
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Reddit Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- benthepythondev
- Pricing
- Paid
- Total Runs
- 255
- Active Users
- 29
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support