Reddit Scraper | All-In-One | $1.5 / 1K
by fatihtahta
Need to pull data from Reddit without the headache? This scraper handles the messy parts so you can focus on what matters. I use it to grab posts and...
Opens on Apify.com
About Reddit Scraper | All-In-One | $1.5 / 1K
Need to pull data from Reddit without the headache? This scraper handles the messy parts so you can focus on what matters. I use it to grab posts and full comment threads from anywhere on Reddit—search results, specific subreddits, user profiles, or direct links. It’s built for speed and reliability, so you get clean, detailed JSON without the usual slowdowns or crashes. Whether you're tracking brand mentions, analyzing community sentiment, or gathering leads, it pulls everything you need in one go. The setup is straightforward. You plug in your target—a keyword, a subreddit like r/startups, or a user’s post history—and it fetches the data. It captures titles, votes, dates, authors, and the entire nested comment tree, which is perfect for deeper analysis or building datasets. For developers and researchers who need consistent, structured Reddit data, this tool saves hours of manual work. It’s the one I rely on when I need accurate results quickly, without worrying about rate limits or parsing issues. Try it for your next project where real Reddit data is required.
What does this actor do?
Reddit Scraper | All-In-One | $1.5 / 1K is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Reddit Scraper Slug: fatihtahta/reddit-scraper ## Overview This actor retrieves publicly available Reddit data, including posts and optional comment threads. Data can be collected using keyword-based search, subreddit-scoped queries, or direct Reddit URLs. Results are returned as structured JSON records with stable field names suitable for analytics, monitoring, and downstream processing. The actor supports configurable limits for posts and comments and exposes separate record types for posts and comments to simplify ingestion into databases, data warehouses, or application pipelines. ## Key Capabilities - Multiple collection modes: Retrieve data via keyword-based search, subreddit-scoped scraping, or direct Reddit URLs (posts, listings, user profiles, and searches). - High-throughput collection: Capable of retrieving 1,000+ posts or 10,000+ comments in under one minute, depending on content structure and limits. - Unrestricted scale: Designed to collect millions of posts or comments across runs without artificial rate limits. - Complete data capture: Ensures a 100% rate for posts and comments reachable within the configured limits. - Operational reliability: Proven 100% successful run rate across production workloads under normal operating conditions. - Rich context extraction: Returns a broader set of metadata than the Reddit official API, including moderation status, awards, media previews, crosspost relationships, and author context. - Deterministic output schema: Emits stable, typed post and comment records with predictable fields suitable for analytics, warehousing, and automated pipelines. - Comment thread traversal: Optionally retrieves full comment trees up to the full depth and maximum count. ## Configuration | Field | Type | Default | Notes | | --- | --- | --- | --- | | queries | string[] | – | Search terms to look up on Reddit. Ignored when urls are provided. | | urls | string[] | – | Specific Reddit URLs to scrape. Takes priority over queries. | | scrapeComments | boolean | false | Set to true to extract comments for each post. | | maxComments | number | 50000 | Maximum comments saved per post when scrapeComments is true. | | maxPosts | number | 50000 | Limit on posts stored for each query or URL. | | includeNsfw | boolean | false | Include results tagged as NSFW. | | sort | "relevance" \| "hot" \| "top" \| "new" \| "comments" | relevance | Ordering for search results. | | timeframe | "hour" \| "day" \| "week" \| "month" \| "year" \| "all" | all | Time range filter for search results when sort is relevance, top, or comments. | | subredditName | string | – | Name of a single subreddit to target (omit the r/ prefix). | | subredditKeywords | string[] | – | Optional keywords combined with subredditName to focus the subreddit search. | | subredditSort | "relevance" \| "hot" \| "top" \| "new" \| "comments" | relevance | Ordering applied when searching within the chosen subreddit. | | subredditTimeframe | "hour" \| "day" \| "week" \| "month" \| "year" \| "all" | all | Time filter that pairs with subredditSort values of relevance, top, or comments. | | strictSearch | boolean | false | When true, wrap query tokens in quotes and join with AND to force Reddit into phrase/AND semantics (cuts noisy matches). | | strictTokenFilter | boolean | false | When true, discard posts whose title/body/url do not contain every query token as a whole word/phrase. Useful to remove "looming" vs. loom false positives. | ## Usage Examples ### Sample Input json { "includeNsfw": false, "queries": [ "Cheesecake", "Swimming Pool" ], "scrapeComments": true, "sort": "hot", "timeframe": "year", "urls": [ "https://www.reddit.com/r/socialmedia/" ] } ### Sample Output #### Post record (kind: "post") json { "kind": "post", "query": "Cheesecake", "id": "1oiwt3p", "title": "My first cheesecake :)", "body": "Turned out a bit short but thats ok cause it tasted amazing. ", "author": "No_Opportunity_1502", "score": 27, "upvote_ratio": 0.97, "num_comments": 1, "subreddit": "Baking", "created_utc": "2025-10-29T05:59:38.000Z", "url": "https://www.reddit.com/r/Baking/comments/1oiwt3p/my_first_cheesecake/", "flair": "No-Recipe Provided", "over_18": false, "is_self": false, "spoiler": false, "locked": false, "is_video": false, "domain": "old.reddit.com", "thumbnail": "https://b.thumbs.redditmedia.com/oIOAf9jpp5jUSRjEljGBBvN4EOtH6dJo7sujoeG3Wug.jpg", "url_overridden_by_dest": "https://www.reddit.com/gallery/1oiwt3p", "media": null, "media_metadata": null, "gallery_data": { "items": [ { "media_id": "iniej0usqzxf1", "id": 782212827 }, { "media_id": "qi29mztsqzxf1", "id": 782212828 }, { "media_id": "fehlpdvsqzxf1", "id": 782212829 } ] }, "stickied": false, "distinguished": null, "total_awards_received": 3, "all_awardings": [ { "count": 1, "name": "Helpful" }, { "count": 2, "name": "Wholesome" } ], "gilded": 0, "num_crossposts": 1, "is_original_content": true, "author_fullname": "t2_abcd1234", "author_flair_text": "Pro Baker", "author_premium": false, "selftext_html": "<p>Turned out a bit short...</p>", "preview": { "images": [{ "id": "preview-id" }] }, "secure_media": null, "secure_media_embed": null, "crosspost_parent_list": null } #### Comment record (kind: "comment") json { "kind": "comment", "query": "https://www.reddit.com/r/technology/...", "id": "k5z1x2y", "postId": "t3_1d95j4g", "parentId": "t3_1d95j4g", "body": "Great analysis, but I think you're underestimating the impact of quantum computing on these timelines.", "author": "future_thinker", "score": 142, "created_utc": "2025-08-05T19:15:22.000Z", "url": "https://www.reddit.com/r/technology/comments/1d95j4g/the_state_of_ai_in_2025_a_comprehensive_report/k5z1x2y/", "stickied": false, "distinguished": null, "is_submitter": true, "score_hidden": false, "controversiality": 0, "depth": 0 } ## Output Guide ### Post fields (kind: "post") kind: Record type identifier; always "post" for post items. query: The search term or URL that produced this record. id: Reddit short ID of the post. title: Post title text. body: Post body text; empty for link-only posts. author: Username of the post creator. score: Net upvotes the post has received. upvote_ratio: Fraction of positive votes (0–1). num_comments: Number of comments on the post when fetched. subreddit: Name of the subreddit containing the post. created_utc: ISO timestamp when the post was created. url: Direct link to the Reddit post. flair: Subreddit-assigned flair text; null if none. over_18: Whether the post is marked NSFW; null when unavailable. is_self: True for text posts; false for link/gallery posts; null if unknown. spoiler: Indicates spoiler-tagged posts; null if not provided. locked: Whether new comments are disabled; null if unknown. is_video: True when the post is a native video; null otherwise. domain: Destination domain for link posts; null for self posts. thumbnail: Thumbnail URL when present; often "self" or null for text posts. url_overridden_by_dest: Original outbound URL for link posts; null otherwise. media: Media payload for videos or embeds; null when absent. media_metadata: Per-item metadata for galleries; null otherwise. gallery_data: Gallery structure for multi-image posts; null otherwise. stickied: True for posts pinned to the subreddit; null if not set. distinguished: Moderator/admin marker (e.g., "moderator"); null otherwise. total_awards_received: Count of awards on the post; null if not present. all_awardings: List of award objects applied to the post; null if none. gilded: Number of times the post was gilded; null if missing. num_crossposts: Count of times this post was crossposted; null if unknown. is_original_content: True when marked as original content; null if absent. author_fullname: Internal Reddit user ID for the author; null when hidden. author_flair_text: Author’s flair text; null if none. author_premium: Whether the author has Reddit Premium; null if not provided. selftext_html: HTML-rendered body for text posts; null for links or when unavailable. preview: Image preview metadata for media posts; null otherwise. secure_media: Media details for secure embeds; null when absent. secure_media_embed: Embed metadata for secure media; null when absent. crosspost_parent_list: Source post data for crossposts; null if not a crosspost. ### Comment fields (kind: "comment") kind: Record type identifier; always "comment" for comment items. query: The search term or URL that produced this record. id: Reddit short ID of the comment. postId: ID of the parent post. postUrl: Direct link to the parent post. parentId: ID of the parent comment or post. body: Comment text content. author: Username of the comment creator. score: Net upvotes the comment has received. created_utc: ISO timestamp when the comment was created. url: Direct link to the comment. stickied: True for comments pinned by moderators; null otherwise. distinguished: Moderator/admin marker (e.g., "moderator"); null otherwise. is_submitter: True when the commenter is also the post author; null if unavailable. score_hidden: True when scores are hidden temporarily; null if not provided. controversiality: Reddit controversy score; null when absent. depth: Nesting level in the thread; null if missing. ## Pricing The actor costs $1.50 per 1,000 saved items (posts or comments). Infrastructure and residential proxy expenses are included, and you only pay for successful results. Example: scraping 10,000 posts and 25,000 comments equals 35,000 saved items, which costs (35,000 / 1,000) * $1.50 = $52.50. ## Operational Tips - Control run time and cost: Enable scrapeComments when comment-level analysis is required. - Scale in batches for large datasets: For large historical backfills or multi-million item collections, split runs by subreddit, time range, or URL lists to simplify retries and monitoring. - Use limits defensively: Set maxPosts and maxComments to explicit values to prevent unbounded runs, especially when scraping high-activity subreddits. - Leverage subreddit mode for monitoring: Subreddit-scoped scraping is well-suited for recurring monitoring jobs where consistent coverage and ordering matter. - Account for sorting behavior: Time filters apply only to compatible sort modes (e.g., top, relevance, comments). Ensure sort and timeframe combinations are aligned with your data goals. ## Ethics & Compliance The scraper collects only publicly available Reddit data and avoids private information. Ensure you have a legitimate reason to process any personal data returned in the results. ## Support Need help or a custom request? Open an issue via the Apify Console Issues tab and it'll be resolved around the clock. Happy scrapings! -Fatih
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Reddit Scraper | All-In-One | $1.5 / 1K now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- fatihtahta
- Pricing
- Paid
- Total Runs
- 52,768
- Active Users
- 632
Related Actors
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Linkedin Profile Details Scraper + EMAIL (No Cookies Required)
by apimaestro
Twitter (X.com) Scraper Unlimited: No Limits
by apidojo
Content Checker
by jakubbalada
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support