Ghost Newsletter Scraper
by barrierefix
Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator econom...
Opens on Apify.com
About Ghost Newsletter Scraper
Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy.
What does this actor do?
Ghost Newsletter Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Ghost Newsletter Scraper Extract structured data from any Ghost-powered newsletter - track posts, monitor pricing, analyze publishing patterns, and research the creator economy. ## Perfect For - Content Strategists - Monitor competitor newsletters and track content trends - Market Researchers - Analyze the creator economy and identify publishing patterns - Business Development Teams - Find sponsorship opportunities and track pricing changes - Newsletter Creators - Research successful strategies and analyze posting frequency - Agencies - Monitor multiple client newsletters and generate performance reports ## Quick Start The simplest way to get started - just add a newsletter URL: json { "startUrls": [ { "url": "https://blog.ghost.org/" } ] } That's it! The scraper will automatically: - ✅ Detect that it's a Ghost site - ✅ Find all posts via RSS feed (fastest method) - ✅ Extract titles, authors, publish dates, and content metadata - ✅ Get site info like posting frequency and pricing ## What You'll Get ### Newsletter Sites Information about each newsletter: - Title and description - Publishing frequency (posts per 30 days) - Subscription pricing (if available) - Social media links - Last post date ### Articles & Posts Full metadata for each article: - Title, URL, and excerpt - Author(s) with profile links - Tags and categories - Publish and update dates - Word count and reading time estimate - OpenGraph and Twitter Card data ### Writers & Contributors Author profiles including: - Name and bio - Profile page URL - Social media links (Twitter, LinkedIn, GitHub, website) ## Common Use Cases ### 1. Competitive Intelligence Scenario: Track what your competitors are publishing json { "startUrls": [ { "url": "https://competitor1.com" }, { "url": "https://competitor2.com" } ], "lookbackDays": 7, "outputLevel": "posts" } Get only new posts from the last week. Combine with n8n to get Slack alerts when competitors publish. ### 2. Content Research Scenario: Analyze successful newsletters in your niche json { "startUrls": [ { "url": "https://popular-newsletter.com" } ], "limitPerSite": 100, "emitAuthors": true, "fetchPricingSignals": true } Get the last 100 posts, author info, and pricing to understand their strategy. ### 3. Sponsorship Prospecting Scenario: Find newsletters that accept sponsors json { "domains": ["newsletter1.com", "newsletter2.com", "newsletter3.com"], "fetchPricingSignals": true, "outputLevel": "publication", "limitPerSite": 1 } Quickly scan multiple newsletters for pricing pages and subscription info. ### 4. Publishing Pattern Analysis Scenario: Track how often newsletters in a category post json { "startUrls": [ { "url": "https://tech-newsletter.com" } ], "lookbackDays": 90, "deltaCrawl": false } Get 3 months of data to analyze posting frequency and consistency. ## Input Options ### Start URLs (Required) Choose one method to specify which newsletters to scrape: - Start URLs: Full URLs like https://blog.ghost.org/ - Domains: Just the domain like blog.ghost.org (https:// added automatically) ### What to Extract | Option | What it does | |--------|--------------| | What to extract | Choose "Everything" (posts + site info), "Posts only", or "Site info only" | | Extract author profiles | Get author bios and social links (recommended: ON) | | Extract pricing | Find subscription prices from /subscribe pages (recommended: ON) | | Extract tags | Get article tags/categories (experimental, slower) | ### Crawl Settings | Setting | Description | Recommended | |---------|-------------|-------------| | How to find posts | RSS first (fastest), Sitemap first (most complete), or Hybrid | RSS first | | Max posts per newsletter | Stop after this many posts | 200 | | Only posts from last X days | Filter by date (0 = all posts) | 0 or 30 | | Skip already-seen posts | Delta crawling saves costs! | ON | ### Filters (Optional) Use regex patterns to include/exclude specific URLs: - Include: [".*/tag/tech/.*"] - Only posts tagged "tech" - Exclude: [".*/author/.*"] - Skip author pages ## Output Format All data is saved to a dataset with multiple views for easy filtering: ### View 1: All Records Complete data with everything mixed together ### View 2: Newsletter Sites json { "type": "publication", "domain": "blog.ghost.org", "title": "Ghost Blog", "post_velocity_30d": 12, "last_post_at": "2025-09-28T10:00:00Z", "pricing": { "has_subscribe": true, "plan_cards": [...] } } ### View 3: Articles & Posts json { "type": "post", "domain": "blog.ghost.org", "title": "How to Build a Newsletter", "url": "https://blog.ghost.org/how-to-build/", "authors": [{"name": "Jane Doe"}], "tags": ["guides", "tutorials"], "published_at": "2025-09-15T08:00:00Z", "word_count_est": 1240, "reading_time_min_est": 6 } ### View 4: Writers & Contributors json { "type": "author", "name": "Jane Doe", "profile_url": "https://blog.ghost.org/author/jane/", "bio": "Writer and creator", "social": { "twitter": "https://twitter.com/jane", "website": "https://janedoe.com" } } ## Integrations ### n8n Workflows New Post Alert 1. Trigger: Apify Dataset Item webhook (filter: type=post) 2. OpenAI: Summarize the post 3. Slack: Post to #content-monitoring 4. Notion: Add to content calendar Pricing Change Alert 1. Trigger: Schedule (daily) 2. Get latest dataset (filter: type=publication) 3. Compare pricing with previous run 4. Email: Alert team if prices changed Weekly Digest 1. Trigger: Schedule (Monday 9 AM) 2. Get posts from last 7 days 3. Group by newsletter 4. Email: Send digest to team ### Make.com / Zapier The actor works with any automation tool that supports webhooks or API calls. Use Apify's integration to trigger workflows when new data is found. ## How It Works 1. Detects Ghost - Automatically identifies Ghost-powered sites using meta tags, Portal scripts, and RSS feeds 2. Finds Posts - Uses RSS feeds (fastest), sitemaps, or HTML pagination to discover articles 3. Extracts Data - Parses JSON-LD, OpenGraph tags, and HTML to get complete metadata 4. Saves Results - Stores everything in a structured dataset with easy-to-use views 5. Tracks Changes - Delta crawling means you only pay for new content (saves costs!) ## Pricing & Performance Pricing Model: This actor uses pay-per-item pricing - you only pay for successfully extracted items (posts, authors, sites). Delta crawling significantly reduces costs on repeat runs by avoiding duplicate processing. See current pricing in the Apify Console when starting a run. Speed: - RSS mode: ~5-10 seconds per newsletter - Sitemap mode: ~10-20 seconds per newsletter - Faster than headless browser scrapers by 5-10x ## Ethical & Compliant This scraper: - ✅ Only accesses public content - ✅ Respects robots.txt rules - ✅ Implements rate limiting - ✅ Identifies itself clearly - ❌ Never bypasses paywalls - ❌ Never accesses private content - ❌ Never logs in or authenticates ## Advanced Settings For power users, we offer: - Concurrency control - Adjust speed vs. politeness - Circuit breakers - Auto-stop on errors to save costs - Proxy support - Use Apify proxy for large-scale scraping - Browser mode - Enable Playwright for JavaScript-heavy sites (rarely needed) - Custom User-Agent - Identify your scraper however you want Most users can ignore these - the defaults work great! ## Limitations - Ghost only - Only works with Ghost-powered sites (use our detector or check for "Powered by Ghost") - Public content - Cannot access members-only or premium content - Static pricing - Pricing detection works on static pages only (not Portal overlay) - No authentication - Doesn't support logged-in scraping ## Troubleshooting "No posts found" - Check if the site has an RSS feed at /rss/ or /feed/ - Try changing discovery mode to "Sitemap first" or "Hybrid" - Verify it's actually a Ghost site (look for "Powered by Ghost" footer) "Site isn't Ghost" - The detector looks for Ghost-specific signals - Some Ghost sites are heavily customized - try anyway, it might still work - Turn off "Stop if site isn't Ghost" to scrape anyway "Too slow" - Use "RSS first" mode (fastest) - Reduce "Max posts per newsletter" - Enable "Skip already-seen posts" for repeat runs "Hitting rate limits" - Reduce "Max requests per site" (try 2) - Enable "Respect robots.txt" - Add delays by reducing concurrency ## Support - Email: kontakt@barrierefix.de - Issues: Report bugs via Apify Console - Documentation: This README + input field tooltips ## Version History 1.0.0 (2025-10-01) - Initial release - Ghost detection with multi-signal verification - RSS, sitemap, and HTML discovery modes - Post, site, and author extraction - Pricing detection for subscription newsletters - Delta crawling with hash-based deduplication - Circuit breakers and smart error handling - n8n integration ready --- ## 🔗 Explore More of Our Actors ### 📰 Content & Publishing | Actor | Description | |-------|-------------| | Notion Marketplace Scraper | Scrape Notion templates and marketplace listings | | Farcaster Hub Scraper | Scrape Farcaster decentralized social network data | | Google Play Reviews Scraper | Extract app reviews from Google Play Store | ### 💬 Social Media & Community | Actor | Description | |-------|-------------| | Reddit Scraper Pro | Monitor subreddits and track keywords with sentiment analysis | | Discord Scraper Pro | Extract Discord messages and chat history for community insights | | YouTube Comments Harvester | Comprehensive YouTube comments scraper with channel-wide enumeration | | YouTube Contact Scraper | Extract YouTube channel contact information for outreach | | YouTube Shorts Scraper | Scrape YouTube Shorts for viral content research | --- ## License MIT - Use commercially, modify freely, no attribution required. --- Made by Barrierefix - Building tools for the creator economy. ## Legal Disclaimer / Rechtlicher Hinweis EN: This actor is a general-purpose tool for analyzing publicly accessible web data. The user bears sole responsibility for ensuring their specific use complies with: - Applicable laws (GDPR/DSGVO, copyright law) - The target website's Terms of Service - Apify's Terms of Service The provider (barrierefix) expressly disclaims liability for any unauthorized or unlawful use. By using this actor, the user agrees to indemnify the provider against any third-party claims arising from their use of the data. DE: Dieser Actor ist ein allgemeines Werkzeug zur Analyse öffentlich zugänglicher Webdaten. Der Nutzer trägt die alleinige Verantwortung dafür, dass seine spezifische Nutzung den geltenden Gesetzen (DSGVO, Urheberrecht), den Nutzungsbedingungen der Zielwebsite und den Apify-Nutzungsbedingungen entspricht. Der Anbieter (barrierefix) schließt jegliche Haftung für unbefugte oder rechtswidrige Nutzung ausdrücklich aus. Mit der Nutzung dieses Actors erklärt sich der Nutzer bereit, den Anbieter von allen Ansprüchen Dritter freizustellen, die aus seiner Datennutzung entstehen. --- This tool is not affiliated with Ghost. All trademarks belong to their respective owners.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Ghost Newsletter Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- barrierefix
- Pricing
- Paid
- Total Runs
- 65
- Active Users
- 2
Related Actors
Smart Article Extractor
by lukaskrivka
Google Search
by devisty
Twitter Tweets Scraper
by gentle_cloud
Twitter Profile
by danek
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support