Substack Newsletter Scraper

Substack Newsletter Scraper

by digispruce

Automatically extract Substack author profiles, subscriber counts, and contact info for lead lists and market research. Save hours of manual work.

71 runs
13 users
Try This Actor

Opens on Apify.com

About Substack Newsletter Scraper

Need to find writers, influencers, or experts on Substack for a project? This scraper pulls all the public details you'd normally have to hunt for manually. I use it to build clean datasets from Substack newsletters, grabbing the author's name, bio, and estimated subscriber count right from their profile page. It also collects any social media handles and contact info they've listed, which is perfect for building a targeted outreach list. The main reason I built this was for B2B lead generation. Instead of spending hours copying and pasting, you can run this actor to gather profiles for a whole niche—like tech commentators or finance writers—in one go. It's also become a go-to for my market research. Seeing who has a large audience in a specific topic area helps identify trends and key players fast. You get structured JSON or CSV output that's ready to import into your CRM or a spreadsheet for further analysis. It just automates the tedious part, so you can focus on the actual connection or research.

What does this actor do?

Substack Newsletter Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Substack Newsletter Scraper

Overview

An Apify Actor that scrapes Substack newsletters for publication details, author contact info, and post data. Use it for lead generation, market research, or competitive analysis by processing multiple newsletters in a single run.

Key Features

  • Publication Metadata: Extracts title, description, author info, categories, and public subscriber/follower counts.
  • Contact Discovery: Finds author email addresses and scrapes social media links (Twitter, LinkedIn, etc.).
  • Flexible Post Scraping: Choose from three modes to control data depth and usage:
    • none: Newsletter metadata only (fastest).
    • information: Metadata plus post titles, dates, and engagement stats.
    • information_and_content: All of the above plus full article content.
  • Batch Processing: Scrape multiple newsletter URLs in one execution.
  • Pay-per-Event: You are only charged for successfully scraped newsletters.

How to Use

Configure the actor with input parameters via JSON. The only required parameter is the list of newsletter URLs.

Input

Provide input as a JSON object. The actor supports both *.substack.com URLs and custom domains.

Required:
* newsletterUrls (array): A list of newsletter objects, each containing a url.

Optional:
* postScrapingMode (string): Default is "none". Options: "none", "information", "information_and_content".
* maxPostsPerNewsletter (number): Default is 12 (max: 12). Number of recent posts to scrape. Only applies if postScrapingMode is not "none".
* delayBetweenRequests (number): Default is 3000 ms (range: 500-10000). Delay between requests to avoid rate limiting.

Example Input

{
  "newsletterUrls": [
    { "url": "https://platformer.substack.com" },
    { "url": "https://lennysnewsletter.com" }
  ],
  "postScrapingMode": "information",
  "maxPostsPerNewsletter": 5,
  "delayBetweenRequests": 2000
}

Output

The output structure depends on the postScrapingMode.

All modes include comprehensive newsletter metadata:
* Author details (ID, name, bio, photo, handle).
* Contact info (email, website, social media URLs).
* Publication details (ID, name, description, logo, creation date).
* Counts (subscribers, followers, visibility flags).
* Status flags (is paid, is active, etc.).

If postScrapingMode is "information" or "information_and_content", the output adds a posts array. Each post object contains:
* url, title, subtitle, published_at.
* Engagement metrics (like_count, comment_count).
* word_count and a preview_text.
* When mode is "information_and_content", it also includes the full article_content_html.

Example Output Snippet (Mode: "information")

{
  "author_name": "Casey Newton",
  "email": "casey@platformer.news",
  "publication_name": "Platformer",
  "subscriber_count": 176000,
  "posts": [
    {
      "title": "Trump won. Here's what comes next.",
      "published_at": "2024-11-06T14:30:00.000Z",
      "like_count": 245,
      "comment_count": 89,
      "word_count": 1250
    }
  ]
}

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Substack Newsletter Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
digispruce
Pricing
Paid
Total Runs
71
Active Users
13
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support