Reddit Subreddit Scraper

Reddit Subreddit Scraper

by backhoe

Reddit Subreddit Scraper is your plug-and-play radar for Reddit communities: it harvests fresh stats from 100+ subreddits via Apify Residential proxie...

215 runs
8 users
Try This Actor

Opens on Apify.com

About Reddit Subreddit Scraper

Reddit Subreddit Scraper is your plug-and-play radar for Reddit communities: it harvests fresh stats from 100+ subreddits via Apify Residential proxies, returns clean JSON, and drops straight into AI pipelines or dashboards within minutes.

What does this actor do?

Reddit Subreddit Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Reddit Subreddit Scraper & Proxy Pipeline Apify Actor Badge Fetch live subreddit metadata with built-in ban resistance, smart fallbacks, and automatic proxy rotation. This Actor is designed to be the most reliable way to check subreddit details (subscribers, title, description, active users, etc.) without getting blocked. It intelligently switches between Reddit's API and web endpoints to ensure data delivery. --- ## 🚀 Key Features - 🛡️ Smart Ban Resistance: Automatically rotates Apify Residential Proxies for every single request. - 🔄 Dual-Mode Fetching: Tries the official Reddit API first. If blocked (403/429), it seamlessly falls back to web scraping (/r/name/about) to get the data. - ⚡ High Performance: Built on asyncio and httpx (HTTP/2) for maximum concurrency and speed. - 🔗 Flexible Inputs: Accepts any format: - Subreddit names: r/AskReddit, AskReddit - Full URLs: https://www.reddit.com/r/Python - Reddit Fullnames: t5_2qh1i - 📊 Rich Diagnostics: Returns detailed stats for every item, including HTTP status codes, attempt counts, and whether the fallback was used. --- ## 📥 Input Configuration The Actor accepts a simple JSON input. You only need to provide the list of subreddits. ### Example Input json { "subreddits": [ "r/AskReddit", "https://www.reddit.com/r/machinelearning", "t5_2qh1i" ] } | Field | Type | Description | |-------|------|-------------| | subreddits | Array | Required. List of subreddits to fetch. Supports r/name, URLs, or t5_ IDs. | | proxyConfiguration | Object | Optional. If omitted, the Actor enables Apify Residential proxies automatically (recommended for Reddit). | --- ## 📤 Output Data The results are stored in the default Apify Dataset. Each item represents one subreddit and contains the full response from Reddit. ### Success Example json { "input_raw": "r/AskReddit", "input_type": "name", "identifier_value": "AskReddit", "status": "success", "http_status": 200, "response": { "kind": "t5", "data": { "display_name": "AskReddit", "title": "Ask Reddit...", "subscribers": 57140321, "active_user_count": 84210, "public_description": "r/AskReddit is the place to ask and answer thought-provoking questions.", "created_utc": 1201233135.0 } }, "used_web_fallback": false, "attempts_used": 1 } ### Error Example If a subreddit cannot be reached after multiple retries, it is marked as an error: json { "input_raw": "r/NonExistentSub123", "status": "error", "http_status": 404, "error": "404 Client Error: Not Found for url: https://www.reddit.com/r/NonExistentSub123/about.json", "attempts_used": 6 } --- ## 💡 Tips & Tricks - Use Residential Proxies: Reddit is very strict with datacenter IPs. This Actor is pre-configured to use Residential proxies for the best success rate. - Batch Processing: You can pass thousands of subreddits in a single run. The Actor handles concurrency automatically. - Monitoring: The output includes attempts_used and used_web_fallback. If you see used_web_fallback: true often, it means the API is blocking requests, but the Actor is successfully bypassing it via the web interface. --- ### License Apache 2.0

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Reddit Subreddit Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
backhoe
Pricing
Paid
Total Runs
215
Active Users
8
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support