Reddit Scraper

Name: Reddit Scraper
Author: benthepythondev

by benthepythondev

Extract Reddit posts, comments & user data in AI-ready markdown format. No API keys needed! 25% cheaper than competitors. Perfect for AI training, sen...

255 runs

29 users

Try This Actor

Opens on Apify.com

About Reddit Scraper

Extract Reddit posts, comments & user data in AI-ready markdown format. No API keys needed! 25% cheaper than competitors. Perfect for AI training, sentiment analysis & market research. Includes bulk comment scraping with progress tracking.

What does this actor do?

Reddit Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Reddit Scraper - Fast & AI-Ready Data Extraction Extract Reddit posts, comments, and user data in markdown format perfect for AI training, market research, and sentiment analysis. No API keys needed! ## What can Reddit Scraper extract? This Reddit Scraper can extract comprehensive data from Reddit including: - Posts: Titles, content (text/markdown/HTML), scores, comments count, awards, timestamps - Comments: Nested comment threads with full hierarchy, scores, and timestamps - User Data: Post history, karma scores, account information - Subreddit Info: Community statistics, descriptions, member counts - Search Results: Find posts across Reddit or within specific communities - Images & Media: Extract image URLs, thumbnails, and media metadata - Engagement Metrics: Upvote ratios, comment counts, award counts - AI-Ready Output: Token counts and markdown formatting for LLM training ## Why choose Reddit Scraper? ✅ 25% Cheaper - Only $1.50 per 1,000 results vs $2.00+ from competitors ✅ Faster - Uses Reddit's JSON API (no heavy browser needed) ✅ Bulk Comment Loading - Efficient scraping with up to 500 comments per request ✅ AI-Optimized - Markdown output with token counts for ML training ✅ No API Keys - Works without Reddit API authentication ✅ Progress Tracking - Real-time updates on scraping progress ✅ Easy to Use - Simple input configuration, no coding required ## How do I use Reddit Scraper? ### 1. Create a free Apify account Sign up at apify.com - you get $5 free credit (enough for 3,300+ posts!) ### 2. Start the Actor Visit the Reddit Scraper page and click "Try for free" ### 3. Configure your scrape Choose what to scrape: Subreddit Posts: `json { "mode": "subreddit", "subreddit": "ArtificialInteligence", "sort": "hot", "maxPosts": 100 }` Single Post + Comments: `json { "mode": "post", "postUrl": "https://www.reddit.com/r/python/comments/abc123/example/", "maxComments": 500 }` User Posts: `json { "mode": "user", "username": "example_user", "maxPosts": 100 }` Search Reddit: `json { "mode": "search", "searchQuery": "machine learning", "searchSubreddit": "python", "maxPosts": 200 }` ### 4. Download your data Export in JSON, CSV, Excel, XML, or HTML format ## Input Parameters | Parameter | Type | Description | |-----------|------|-------------| | `mode` | string | Scraping mode: `subreddit`, `post`, `user`, or `search` | | `subreddit` | string | Subreddit name (e.g., "python") | | `postUrl` | string | Full URL of post to scrape | | `username` | string | Reddit username to scrape | | `searchQuery` | string | Search query | | `sort` | string | Sort order: `hot`, `new`, `top`, `rising`, `controversial` | | `timeFilter` | string | Time filter: `hour`, `day`, `week`, `month`, `year`, `all` | | `maxPosts` | integer | Maximum posts to scrape (0 = unlimited) | | `maxComments` | integer | Maximum comments per post (0 = unlimited, applies to both post mode and subreddit mode with includeComments enabled) | | `includeComments` | boolean | Include comments in subreddit mode (enables bulk comment scraping with progress tracking) | | `sinceDate` | string | Only posts after this date (YYYY-MM-DD) | | `outputFormat` | string | Content format: `markdown`, `html`, or `text` | | `includeImages` | boolean | Extract image URLs | | `delaySeconds` | number | Delay between requests (default: 1.0) | ## Output Example `json { "id": "abc123", "title": "How I built an AI agent that scrapes Reddit", "url": "https://reddit.com/r/artificial/comments/abc123/", "selftext_markdown": "Here's my complete guide...", "author": "ai_developer", "subreddit": "artificial", "score": 1250, "upvote_ratio": 0.97, "num_comments": 89, "created_utc": "2025-01-15T10:30:00Z", "word_count": 850, "token_count": 1200, "images": [ { "url": "https://i.redd.it/example.jpg", "width": 1200, "height": 800 } ] }` ## Use Cases ### 1. AI Training Data 🤖 Reddit is goldmine for LLM training: - Real human conversations and discussions - Expert Q&A across 100K+ communities - Diverse topics and writing styles - Already in markdown format for easy processing Example: Train a customer service chatbot on 50K support-related Reddit posts ### 2. Market Research 📊 Understand what people really think: - Track brand mentions and sentiment - Monitor competitor discussions - Identify trending topics and pain points - Analyze customer feedback in real-time Example: Scrape r/SaaS to understand startup challenges and opportunities ### 3. Content Research ✍️ Find ideas and inspiration: - Discover viral content patterns - Identify popular discussion topics - Research audience questions and pain points - Find engaging headlines and angles Example: Scrape top posts from r/Entrepreneur for blog content ideas ### 4. Sentiment Analysis 😊😡 Analyze public opinion at scale: - Track sentiment on products/brands - Monitor crisis situations - Understand community mood shifts - Identify influencers and thought leaders Example: Analyze 10K comments about a new product launch ### 5. Academic Research 🎓 Study online communities: - Social network analysis - Language and communication patterns - Community dynamics and moderation - Misinformation spread patterns Example: Research how scientific information spreads on Reddit ### 6. Competitive Intelligence 🔍 Stay ahead of competitors: - Monitor competitor mentions - Track industry discussions - Identify emerging trends early - Understand customer pain points Example: Track all mentions of competitors in your industry subreddits ## How much will it cost to scrape Reddit data? Reddit Scraper uses pay-per-result pricing - you only pay for the data you extract. Pricing: $1.50 per 1,000 results ### Cost Examples: | Posts Scraped | Cost | What You Get | |---------------|------|--------------| | 100 posts | $0.15 | Small subreddit sample | | 1,000 posts | $1.50 | Medium dataset | | 10,000 posts | $15.00 | Large research dataset | | 100,000 posts | $150.00 | Enterprise AI training data | ### Free Tier: With Apify's free plan ($5 credit), you get: - ~3,300 posts FREE to try the Actor - Perfect for testing and small projects ### ROI Calculation: Manual Scraping: - Time: ~2 minutes per post manually - 1,000 posts = 33 hours of work - At $25/hour = $825 cost Reddit Scraper: - Time: ~2 minutes total (automated) - 1,000 posts = $1.50 - Savings: $823.50 (99.8% cost reduction!) ## Pro Tips ### Optimize for Speed - Use `hot` or `new` sort - they're faster than `top` - Set reasonable `maxPosts` limits - Use `includeComments: false` unless you need them ### Get Quality Data - Use `markdown` output format for AI training - Filter by `timeFilter` to get recent content - Use `sinceDate` for incremental scraping - Sort by `top` + `week` for high-quality posts - Enable `includeComments` for complete conversation data ### Efficient Comment Scraping - Set `maxComments` to limit comments per post (default: 100) - Uses bulk loading (up to 500 comments per request) - Includes progress tracking showing scraped/failed posts - Nested comments are preserved with full hierarchy - Failed posts are logged but don't stop the scraping ### Avoid Rate Limits - Keep `delaySeconds` at 1.0 or higher - Scrape during off-peak hours (US nighttime) - Don't scrape the same subreddit repeatedly ### Save Money - Set `maxPosts` to avoid over-scraping - Use search mode for targeted data - Scrape only what you need ## Technical Details ### How It Works Reddit Scraper uses Reddit's official JSON API (not web scraping): 1. Converts Reddit URLs to JSON API endpoints 2. Fetches data using HTTP requests (no browser) 3. Parses and structures data into clean models 4. Converts HTML to markdown for AI compatibility 5. Counts tokens for LLM training estimation ### Data Quality - ✅ Real-time data (not cached) - ✅ Complete post and comment threads - ✅ Nested comment structure preserved - ✅ All metadata included (scores, timestamps, awards) - ✅ Markdown formatting cleaned and optimized ### Performance - Speed: ~100-200 posts per minute - Reliability: 99%+ success rate - Scale: Tested with 100K+ posts ### Limitations - Cannot access deleted/removed posts - Cannot scrape private subreddits - Reddit's API has 100 posts/page limit (we handle pagination) - Comments are limited by Reddit's API (usually ~500 top-level comments per post) ## Comparison: Reddit Scraper vs Alternatives | Feature | Reddit Scraper | Manual Scraping | Reddit API | Other Scrapers | |---------|----------------|-----------------|------------|----------------| | Price | $1.50/1K | $825/1K | $12K+/50M | $2-5/1K | | No API Key | ✅ | N/A | ❌ | Varies | | Markdown Output | ✅ | ❌ | ❌ | ❌ | | Token Counts | ✅ | ❌ | ❌ | ❌ | | Speed | Fast | Slow | Fast | Varies | | Easy Setup | ✅ | ❌ | ❌ | ✅ | | Scale | Unlimited | Limited | Limited | Unlimited | ## Frequently Asked Questions ### Is this legal? Yes! Reddit Scraper only accesses publicly available data that Reddit makes available through their JSON API. We respect robots.txt and rate limits. ### Do I need a Reddit API key? No! Reddit Scraper uses Reddit's public JSON API which doesn't require authentication for public content. ### Can I scrape private subreddits? No, only public content is accessible without authentication. ### How fast is it? Approximately 100-200 posts per minute, depending on content size and settings. ### Can I scrape comments? Yes! Use `mode: "post"` to scrape a specific post with all its comments, or enable `includeComments` in subreddit mode. ### What's the maximum I can scrape? There's no hard limit, but we recommend batching large scrapes (10K+ posts) to avoid timeouts. ### Why markdown format? Markdown is perfect for AI training because it: - Preserves text structure (bold, links, lists) - Is lightweight and clean - Works great with LLMs like GPT, Claude, etc. - Easy to convert to other formats ### Can I schedule regular scrapes? Yes! Use Apify's Schedules feature to run the Actor automatically. ### How do I integrate with my application? Use Apify's API or webhooks to trigger scrapes and receive data programmatically. ### What if I hit Reddit's rate limits? Increase the `delaySeconds` parameter. Our default (1.0 seconds) works for most cases. ### Can I get historical data? Reddit's API only provides recent posts (usually last 1000 per subreddit). For historical data, you'll need specialized datasets. ## Support Need help? Have a feature request? - 📧 Email: contact via Apify - 🐛 Issues: Report in the Run console - 💬 Questions: Ask in the Apify community We typically respond within 24 hours! ## Related Actors Check out my other data extraction tools: - Newsletter Scraper - Scrape Substack, Beehiiv & Ghost newsletters with full content extraction More scrapers coming soon! Follow @benthepythondev for updates. --- Ready to extract Reddit data? Start scraping now → 🤖 Built with the Apify SDK | Made by benthepythondev

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Reddit Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: benthepythondev
Pricing: Paid
Total Runs: 255
Active Users: 29

Related Actors

Web Scraper

by apify

Cheerio Scraper

by apify

Website Content Crawler

by apify

Legacy PhantomJS Crawler

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

Reddit Scraper

About Reddit Scraper

What does this actor do?

Key Features

How to Use

Documentation

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?