Ghost Newsletter Scraper

Name: Ghost Newsletter Scraper
Author: barrierefix

by barrierefix

Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator econom...

65 runs

2 users

Try This Actor

Opens on Apify.com

About Ghost Newsletter Scraper

Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.

What does this actor do?

Ghost Newsletter Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Ghost Newsletter Scraper Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy. ## Perfect For - Content Strategists - Monitor competitor newsletters and track content trends - Market Researchers - Analyze the creator economy and identify publishing patterns - Business Development Teams - Find sponsorship opportunities and track pricing changes - Newsletter Creators - Research successful strategies and analyze posting frequency - Agencies - Monitor multiple client newsletters and generate performance reports ## Quick Start The simplest way to get started - just add a newsletter URL: `json { "startUrls": [ { "url": "https://blog.ghost.org/" } ] }` That's it! The scraper will automatically: - ✅ Detect that it's a Ghost site - ✅ Find all posts via RSS feed (fastest method) - ✅ Extract titles, authors, publish dates, and content metadata - ✅ Get site info like posting frequency and pricing ## What You'll Get ### Newsletter Sites Information about each newsletter: - Title and description - Publishing frequency (posts per 30 days) - Subscription pricing (if available) - Social media links - Last post date ### Articles & Posts Full metadata for each article: - Title, URL, and excerpt - Author(s) with profile links - Tags and categories - Publish and update dates - Word count and reading time estimate - OpenGraph and Twitter Card data ### Writers & Contributors Author profiles including: - Name and bio - Profile page URL - Social media links (Twitter, LinkedIn, GitHub, website) ## Common Use Cases ### 1. Competitive Intelligence Scenario: Track what your competitors are publishing `json { "startUrls": [ { "url": "https://competitor1.com" }, { "url": "https://competitor2.com" } ], "lookbackDays": 7, "outputLevel": "posts" }` Get only new posts from the last week. Combine with n8n to get Slack alerts when competitors publish. ### 2. Content Research Scenario: Analyze successful newsletters in your niche `json { "startUrls": [ { "url": "https://popular-newsletter.com" } ], "limitPerSite": 100, "emitAuthors": true, "fetchPricingSignals": true }` Get the last 100 posts, author info, and pricing to understand their strategy. ### 3. Sponsorship Prospecting Scenario: Find newsletters that accept sponsors `json { "domains": ["newsletter1.com", "newsletter2.com", "newsletter3.com"], "fetchPricingSignals": true, "outputLevel": "publication", "limitPerSite": 1 }` Quickly scan multiple newsletters for pricing pages and subscription info. ### 4. Publishing Pattern Analysis Scenario: Track how often newsletters in a category post `json { "startUrls": [ { "url": "https://tech-newsletter.com" } ], "lookbackDays": 90, "deltaCrawl": false }` Get 3 months of data to analyze posting frequency and consistency. ## Input Options ### Start URLs (Required) Choose one method to specify which newsletters to scrape: - Start URLs: Full URLs like `https://blog.ghost.org/` - Domains: Just the domain like `blog.ghost.org` (https:// added automatically) ### What to Extract | Option | What it does | |--------|--------------| | What to extract | Choose "Everything" (posts + site info), "Posts only", or "Site info only" | | Extract author profiles | Get author bios and social links (recommended: ON) | | Extract pricing | Find subscription prices from /subscribe pages (recommended: ON) | | Extract tags | Get article tags/categories (experimental, slower) | ### Crawl Settings | Setting | Description | Recommended | |---------|-------------|-------------| | How to find posts | RSS first (fastest), Sitemap first (most complete), or Hybrid | RSS first | | Max posts per newsletter | Stop after this many posts | 200 | | Only posts from last X days | Filter by date (0 = all posts) | 0 or 30 | | Skip already-seen posts | Delta crawling saves costs! | ON | ### Filters (Optional) Use regex patterns to include/exclude specific URLs: - Include: `["./tag/tech/."]` - Only posts tagged "tech" - Exclude: `["./author/."]` - Skip author pages ## Output Format All data is saved to a dataset with multiple views for easy filtering: ### View 1: All Records Complete data with everything mixed together ### View 2: Newsletter Sites `json { "type": "publication", "domain": "blog.ghost.org", "title": "Ghost Blog", "post_velocity_30d": 12, "last_post_at": "2025-09-28T10:00:00Z", "pricing": { "has_subscribe": true, "plan_cards": [...] } }` ### View 3: Articles & Posts `json { "type": "post", "domain": "blog.ghost.org", "title": "How to Build a Newsletter", "url": "https://blog.ghost.org/how-to-build/", "authors": [{"name": "Jane Doe"}], "tags": ["guides", "tutorials"], "published_at": "2025-09-15T08:00:00Z", "word_count_est": 1240, "reading_time_min_est": 6 }` ### View 4: Writers & Contributors `json { "type": "author", "name": "Jane Doe", "profile_url": "https://blog.ghost.org/author/jane/", "bio": "Writer and creator", "social": { "twitter": "https://twitter.com/jane", "website": "https://janedoe.com" } }` ## Integrations ### n8n Workflows New Post Alert 1. Trigger: Apify Dataset Item webhook (filter: `type=post`) 2. OpenAI: Summarize the post 3. Slack: Post to #content-monitoring 4. Notion: Add to content calendar Pricing Change Alert 1. Trigger: Schedule (daily) 2. Get latest dataset (filter: `type=publication`) 3. Compare pricing with previous run 4. Email: Alert team if prices changed Weekly Digest 1. Trigger: Schedule (Monday 9 AM) 2. Get posts from last 7 days 3. Group by newsletter 4. Email: Send digest to team ### Make.com / Zapier The actor works with any automation tool that supports webhooks or API calls. Use Apify's integration to trigger workflows when new data is found. ## How It Works 1. Detects Ghost - Automatically identifies Ghost-powered sites using meta tags, Portal scripts, and RSS feeds 2. Finds Posts - Uses RSS feeds (fastest), sitemaps, or HTML pagination to discover articles 3. Extracts Data - Parses JSON-LD, OpenGraph tags, and HTML to get complete metadata 4. Saves Results - Stores everything in a structured dataset with easy-to-use views 5. Tracks Changes - Delta crawling means you only pay for new content (saves costs!) ## Pricing & Performance Pricing Model: This actor uses pay-per-item pricing - you only pay for successfully extracted items (posts, authors, sites). Delta crawling significantly reduces costs on repeat runs by avoiding duplicate processing. See current pricing in the Apify Console when starting a run. Speed: - RSS mode: ~5-10 seconds per newsletter - Sitemap mode: ~10-20 seconds per newsletter - Faster than headless browser scrapers by 5-10x ## Ethical & Compliant This scraper: - ✅ Only accesses public content - ✅ Respects robots.txt rules - ✅ Implements rate limiting - ✅ Identifies itself clearly - ❌ Never bypasses paywalls - ❌ Never accesses private content - ❌ Never logs in or authenticates ## Advanced Settings For power users, we offer: - Concurrency control - Adjust speed vs. politeness - Circuit breakers - Auto-stop on errors to save costs - Proxy support - Use Apify proxy for large-scale scraping - Browser mode - Enable Playwright for JavaScript-heavy sites (rarely needed) - Custom User-Agent - Identify your scraper however you want Most users can ignore these - the defaults work great! ## Limitations - Ghost only - Only works with Ghost-powered sites (use our detector or check for "Powered by Ghost") - Public content - Cannot access members-only or premium content - Static pricing - Pricing detection works on static pages only (not Portal overlay) - No authentication - Doesn't support logged-in scraping ## Troubleshooting "No posts found" - Check if the site has an RSS feed at `/rss/` or `/feed/` - Try changing discovery mode to "Sitemap first" or "Hybrid" - Verify it's actually a Ghost site (look for "Powered by Ghost" footer) "Site isn't Ghost" - The detector looks for Ghost-specific signals - Some Ghost sites are heavily customized - try anyway, it might still work - Turn off "Stop if site isn't Ghost" to scrape anyway "Too slow" - Use "RSS first" mode (fastest) - Reduce "Max posts per newsletter" - Enable "Skip already-seen posts" for repeat runs "Hitting rate limits" - Reduce "Max requests per site" (try 2) - Enable "Respect robots.txt" - Add delays by reducing concurrency ## Support - Email: kontakt@barrierefix.de - Issues: Report bugs via Apify Console - Documentation: This README + input field tooltips ## Version History 1.0.0 (2025-10-01) - Initial release - Ghost detection with multi-signal verification - RSS, sitemap, and HTML discovery modes - Post, site, and author extraction - Pricing detection for subscription newsletters - Delta crawling with hash-based deduplication - Circuit breakers and smart error handling - n8n integration ready --- ## 🔗 Explore More of Our Actors ### 📰 Content & Publishing | Actor | Description | |-------|-------------| | Notion Marketplace Scraper | Scrape Notion templates and marketplace listings | | Farcaster Hub Scraper | Scrape Farcaster decentralized social network data | | Google Play Reviews Scraper | Extract app reviews from Google Play Store | ### 💬 Social Media & Community | Actor | Description | |-------|-------------| | Reddit Scraper Pro | Monitor subreddits and track keywords with sentiment analysis | | Discord Scraper Pro | Extract Discord messages and chat history for community insights | | YouTube Comments Harvester | Comprehensive YouTube comments scraper with channel-wide enumeration | | YouTube Contact Scraper | Extract YouTube channel contact information for outreach | | YouTube Shorts Scraper | Scrape YouTube Shorts for viral content research | --- ## License MIT - Use commercially, modify freely, no attribution required. --- Made by Barrierefix - Building tools for the creator economy. ## Legal Disclaimer / Rechtlicher Hinweis EN: This actor is a general-purpose tool for analyzing publicly accessible web data. The user bears sole responsibility for ensuring their specific use complies with: - Applicable laws (GDPR/DSGVO, copyright law) - The target website's Terms of Service - Apify's Terms of Service The provider (barrierefix) expressly disclaims liability for any unauthorized or unlawful use. By using this actor, the user agrees to indemnify the provider against any third-party claims arising from their use of the data. DE: Dieser Actor ist ein allgemeines Werkzeug zur Analyse öffentlich zugänglicher Webdaten. Der Nutzer trägt die alleinige Verantwortung dafür, dass seine spezifische Nutzung den geltenden Gesetzen (DSGVO, Urheberrecht), den Nutzungsbedingungen der Zielwebsite und den Apify-Nutzungsbedingungen entspricht. Der Anbieter (barrierefix) schließt jegliche Haftung für unbefugte oder rechtswidrige Nutzung ausdrücklich aus. Mit der Nutzung dieses Actors erklärt sich der Nutzer bereit, den Anbieter von allen Ansprüchen Dritter freizustellen, die aus seiner Datennutzung entstehen. --- This tool is not affiliated with Ghost. All trademarks belong to their respective owners.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Ghost Newsletter Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: barrierefix
Pricing: Paid
Total Runs: 65
Active Users: 2

Related Actors

Smart Article Extractor

by lukaskrivka

Google Search

by devisty

Twitter Tweets Scraper

by gentle_cloud

Twitter Profile

by danek

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support