Bluesky Jetstream Scraper

Bluesky Jetstream Scraper

by skyscraping

Scrape Bluesky Social via the Jetstream API. Filter by hashtags, users, or language to collect posts, media, profiles, and replies for research and monitoring.

222 runs
13 users
Try This Actor

Opens on Apify.com

About Bluesky Jetstream Scraper

Need to pull data from Bluesky for a project? This scraper taps directly into Bluesky's Jetstream API, which is the most reliable way to get a real-time feed of posts. You can filter what you collect by specific hashtags, usernames, or even languages, so you're not drowning in irrelevant data. It grabs everything—the post text, any attached images or links, the author's profile details, and the full thread context of replies. I've used it to track conversations around specific tech topics, and having the reply chains intact was crucial for understanding the discussion flow. It's become my go-to for building datasets for social listening, spotting emerging trends before they blow up, or just keeping an eye on what certain communities or creators are talking about. If you're researching, analyzing, or monitoring activity on Bluesky, this method via the official API gets you structured, clean data without the hassle of building the pipeline yourself.

What does this actor do?

Bluesky Jetstream Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Bluesky Jetstream Scraper

This Apify actor collects real-time data from the Bluesky social network via its ATProto Firehose (Jetstream). It filters the live stream of content by specific criteria and outputs structured data.

Important context:
* Jetstream vs. Crawling: It uses Bluesky's efficient firehose API for a continuous, real-time data stream. This avoids the rate limits and overhead of traditional crawling.
* Real-Time Only: It captures data as it happens. It cannot retrieve historical posts.
* Evolving Platform: Bluesky's APIs are still in development; the actor may require updates to maintain compatibility.

Key Features

  • Real-time streaming from the Bluesky firehose.
  • Filterable output by hashtags, users, languages, and event types.
  • Configurable data formats (JSON, CSV, etc.).
  • Efficient operation using the native Jetstream API.

How to Use

Configure the actor's input parameters in Apify to define your data collection. The scraper will run continuously, outputting matching posts and events to the dataset until stopped.

Input Parameters

Filtering Parameters

  • hashtags (Array of strings): Collect posts containing any of these hashtags (omit the #). Example: ["apify", "scraping"].
  • usernames (Array of strings): Collect posts authored by any of these users. Example: ["user1.bsky.social", "user2.bsky.social"].
  • languages (Array of strings): Collect posts detected in any of these language codes. Example: ["en", "pt"].
  • wantedCollections (Array of strings): Define which Bluesky event types to collect. Common options:
    • app.bsky.feed.post: Standard posts.
    • app.bsky.feed.like: Likes.
    • app.bsky.feed.repost: Reposts.
    • app.bsky.graph.follow: Follows.

Processing & Output Parameters

  • detectLanguage (Boolean): Enables automatic language detection for posts without a language tag.
  • outputFormat (String): Chooses the dataset file format (e.g., json, csv).
  • customOutputFields (Array of strings): Specifies which post fields to include in the output, allowing you to limit data to only what you need.

Output

The actor outputs items to the Apify dataset, with fields varying based on the event type (wantedCollections) and your customOutputFields. For a standard app.bsky.feed.post, typical output includes:

{
  "uri": "at://did:plc:abc123/app.bsky.feed.post/3k44xq",
  "text": "Post content here #example",
  "author": {
    "did": "did:plc:def456",
    "handle": "user.bsky.social"
  },
  "indexedAt": "2024-01-01T12:00:00.000Z",
  "langs": ["en"],
  "hashtags": ["example"]
}

You can access results via the Apify API, or download them directly in your chosen format (JSON, CSV, etc.) from the Apify Console at https://console.apify.com.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Bluesky Jetstream Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
skyscraping
Pricing
Paid
Total Runs
222
Active Users
13
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support