Bluesky Jetstream Scraper
by skyscraping
Scrape Bluesky Social via the Jetstream API. Filter by hashtags, users, or language to collect posts, media, profiles, and replies for research and monitoring.
Opens on Apify.com
About Bluesky Jetstream Scraper
Need to pull data from Bluesky for a project? This scraper taps directly into Bluesky's Jetstream API, which is the most reliable way to get a real-time feed of posts. You can filter what you collect by specific hashtags, usernames, or even languages, so you're not drowning in irrelevant data. It grabs everything—the post text, any attached images or links, the author's profile details, and the full thread context of replies. I've used it to track conversations around specific tech topics, and having the reply chains intact was crucial for understanding the discussion flow. It's become my go-to for building datasets for social listening, spotting emerging trends before they blow up, or just keeping an eye on what certain communities or creators are talking about. If you're researching, analyzing, or monitoring activity on Bluesky, this method via the official API gets you structured, clean data without the hassle of building the pipeline yourself.
What does this actor do?
Bluesky Jetstream Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Bluesky Jetstream Scraper
This Apify actor collects real-time data from the Bluesky social network via its ATProto Firehose (Jetstream). It filters the live stream of content by specific criteria and outputs structured data.
Important context:
* Jetstream vs. Crawling: It uses Bluesky's efficient firehose API for a continuous, real-time data stream. This avoids the rate limits and overhead of traditional crawling.
* Real-Time Only: It captures data as it happens. It cannot retrieve historical posts.
* Evolving Platform: Bluesky's APIs are still in development; the actor may require updates to maintain compatibility.
Key Features
- Real-time streaming from the Bluesky firehose.
- Filterable output by hashtags, users, languages, and event types.
- Configurable data formats (JSON, CSV, etc.).
- Efficient operation using the native Jetstream API.
How to Use
Configure the actor's input parameters in Apify to define your data collection. The scraper will run continuously, outputting matching posts and events to the dataset until stopped.
Input Parameters
Filtering Parameters
hashtags(Array of strings): Collect posts containing any of these hashtags (omit the#). Example:["apify", "scraping"].usernames(Array of strings): Collect posts authored by any of these users. Example:["user1.bsky.social", "user2.bsky.social"].languages(Array of strings): Collect posts detected in any of these language codes. Example:["en", "pt"].wantedCollections(Array of strings): Define which Bluesky event types to collect. Common options:app.bsky.feed.post: Standard posts.app.bsky.feed.like: Likes.app.bsky.feed.repost: Reposts.app.bsky.graph.follow: Follows.
Processing & Output Parameters
detectLanguage(Boolean): Enables automatic language detection for posts without a language tag.outputFormat(String): Chooses the dataset file format (e.g.,json,csv).customOutputFields(Array of strings): Specifies which post fields to include in the output, allowing you to limit data to only what you need.
Output
The actor outputs items to the Apify dataset, with fields varying based on the event type (wantedCollections) and your customOutputFields. For a standard app.bsky.feed.post, typical output includes:
{
"uri": "at://did:plc:abc123/app.bsky.feed.post/3k44xq",
"text": "Post content here #example",
"author": {
"did": "did:plc:def456",
"handle": "user.bsky.social"
},
"indexedAt": "2024-01-01T12:00:00.000Z",
"langs": ["en"],
"hashtags": ["example"]
}
You can access results via the Apify API, or download them directly in your chosen format (JSON, CSV, etc.) from the Apify Console at https://console.apify.com.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Bluesky Jetstream Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- skyscraping
- Pricing
- Paid
- Total Runs
- 222
- Active Users
- 13
Related Actors
🏯 Tweet Scraper V2 - X / Twitter Scraper
by apidojo
Instagram Scraper
by apify
TikTok Scraper
by clockworks
Instagram Profile Scraper
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support