Substack Newsletter Scraper
by digispruce
Automatically extract Substack author profiles, subscriber counts, and contact info for lead lists and market research. Save hours of manual work.
Opens on Apify.com
About Substack Newsletter Scraper
Need to find writers, influencers, or experts on Substack for a project? This scraper pulls all the public details you'd normally have to hunt for manually. I use it to build clean datasets from Substack newsletters, grabbing the author's name, bio, and estimated subscriber count right from their profile page. It also collects any social media handles and contact info they've listed, which is perfect for building a targeted outreach list. The main reason I built this was for B2B lead generation. Instead of spending hours copying and pasting, you can run this actor to gather profiles for a whole niche—like tech commentators or finance writers—in one go. It's also become a go-to for my market research. Seeing who has a large audience in a specific topic area helps identify trends and key players fast. You get structured JSON or CSV output that's ready to import into your CRM or a spreadsheet for further analysis. It just automates the tedious part, so you can focus on the actual connection or research.
What does this actor do?
Substack Newsletter Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Substack Newsletter Scraper
Overview
An Apify Actor that scrapes Substack newsletters for publication details, author contact info, and post data. Use it for lead generation, market research, or competitive analysis by processing multiple newsletters in a single run.
Key Features
- Publication Metadata: Extracts title, description, author info, categories, and public subscriber/follower counts.
- Contact Discovery: Finds author email addresses and scrapes social media links (Twitter, LinkedIn, etc.).
- Flexible Post Scraping: Choose from three modes to control data depth and usage:
none: Newsletter metadata only (fastest).information: Metadata plus post titles, dates, and engagement stats.information_and_content: All of the above plus full article content.
- Batch Processing: Scrape multiple newsletter URLs in one execution.
- Pay-per-Event: You are only charged for successfully scraped newsletters.
How to Use
Configure the actor with input parameters via JSON. The only required parameter is the list of newsletter URLs.
Input
Provide input as a JSON object. The actor supports both *.substack.com URLs and custom domains.
Required:
* newsletterUrls (array): A list of newsletter objects, each containing a url.
Optional:
* postScrapingMode (string): Default is "none". Options: "none", "information", "information_and_content".
* maxPostsPerNewsletter (number): Default is 12 (max: 12). Number of recent posts to scrape. Only applies if postScrapingMode is not "none".
* delayBetweenRequests (number): Default is 3000 ms (range: 500-10000). Delay between requests to avoid rate limiting.
Example Input
{
"newsletterUrls": [
{ "url": "https://platformer.substack.com" },
{ "url": "https://lennysnewsletter.com" }
],
"postScrapingMode": "information",
"maxPostsPerNewsletter": 5,
"delayBetweenRequests": 2000
}
Output
The output structure depends on the postScrapingMode.
All modes include comprehensive newsletter metadata:
* Author details (ID, name, bio, photo, handle).
* Contact info (email, website, social media URLs).
* Publication details (ID, name, description, logo, creation date).
* Counts (subscribers, followers, visibility flags).
* Status flags (is paid, is active, etc.).
If postScrapingMode is "information" or "information_and_content", the output adds a posts array. Each post object contains:
* url, title, subtitle, published_at.
* Engagement metrics (like_count, comment_count).
* word_count and a preview_text.
* When mode is "information_and_content", it also includes the full article_content_html.
Example Output Snippet (Mode: "information")
{
"author_name": "Casey Newton",
"email": "casey@platformer.news",
"publication_name": "Platformer",
"subscriber_count": 176000,
"posts": [
{
"title": "Trump won. Here's what comes next.",
"published_at": "2024-11-06T14:30:00.000Z",
"like_count": 245,
"comment_count": 89,
"word_count": 1250
}
]
}
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Substack Newsletter Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- digispruce
- Pricing
- Paid
- Total Runs
- 71
- Active Users
- 13
Related Actors
🏯 Tweet Scraper V2 - X / Twitter Scraper
by apidojo
Google Search Results Scraper
by apify
Instagram Profile Scraper
by apify
Tweet Scraper|$0.25/1K Tweets | Pay-Per Result | No Rate Limits
by kaitoeasyapi
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support