Farcaster Hub Scraper

Farcaster Hub Scraper

by barrierefix

Protocol-native Farcaster data ingestion for research, analytics, and social graph analysis. Collect casts, reactions, follows, user profiles, and rea...

60 runs
4 users
Try This Actor

Opens on Apify.com

About Farcaster Hub Scraper

Protocol-native Farcaster data ingestion for research, analytics, and social graph analysis. Collect casts, reactions, follows, user profiles, and real-time events directly from Farcaster Hubs via HTTP API.

What does this actor do?

Farcaster Hub Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Farcaster Hub Scraper Protocol-native Farcaster data ingestion for research, analytics, and social graph analysis. Collect casts, reactions, follows, user profiles, and real-time events directly from Farcaster Hubs via HTTP API. ## Features ✅ Protocol-First Design - Direct Hub HTTP API integration (no third-party dependencies) ✅ Three Ingestion Modes - Deterministic backfill by FIDs, time-bounded studies, or incremental event tailing ✅ Comprehensive Data - Casts, reactions (likes/recasts), follows, user profiles, and events ✅ Optional Enrichment - Parse Frames/Mini-Apps metadata from embedded URLs ✅ State Checkpointing - Migration-safe, resumable runs with automatic state persistence ✅ Rate Limiting & Retries - Production-grade reliability with exponential backoff ✅ Neynar v2 Support - Optional integration with Neynar hosted hubs ✅ Multiple Views - Pre-configured dataset views for easy data exploration ## Who Uses This Actor? ### 🎯 Target Users 📊 Web3 Data Analysts & Researchers (Dune, Flipside) - Export Farcaster data to SQL databases for analytics dashboards - Track protocol growth, user engagement trends, and network effects - Cross-reference social data with onchain transactions 🛠️ Farcaster Frame/Mini-App Developers - Monitor Frame engagement and interaction patterns - Track which users interact with your Mini-Apps - Analyze viral content and user acquisition funnels 📢 Web3 Marketing Agencies & Brands - Track influencer campaigns and brand mentions - Measure content reach and engagement rates - Identify key opinion leaders in the Farcaster ecosystem 🎓 Academic Researchers - Study decentralized social network dynamics - Analyze information diffusion and community formation - Research Web3 social graph topology ## Use Cases by Persona ### 📊 For Data Analysts Influencer Ranking Dashboard json { "mode": "byFids", "fids": [2, 3, 6833, 5650, 7890], "include": {"casts": true, "reactions": true, "userData": true}, "maxRecords": 50000 } → Export to Dune to calculate engagement rates, follower growth, content velocity Protocol Growth Metrics json { "mode": "tailEvents", "maxRecords": 100000 } → Stream all events to track daily active users, network growth, retention ### 🛠️ For Frame Developers Frame Interaction Analysis json { "mode": "byFids", "fids": [list of users who interacted], "include": {"casts": true, "reactions": true}, "fetchEmbeds": true } → Identify which casts contain your Frame, track engagement patterns Real-Time Frame Monitoring json { "mode": "tailEvents", "tail": {"fromEventId": "latest"}, "maxRecords": 10000 } → Get notified when users interact with your Frames in real-time ### 📢 For Marketing Agencies Campaign Performance Tracking json { "mode": "byFids", "fids": [brand_account, influencer1, influencer2], "startTimestamp": 130000000, "stopTimestamp": 130100000, "include": {"casts": true, "reactions": true} } → Measure campaign reach during specific time window Influencer Discovery json { "mode": "byFids", "fids": [competitor_followers], "include": {"links": true, "userData": true, "reactions": true} } → Find high-engagement users in target communities ### 🎓 For Researchers Social Network Topology Study json { "mode": "byFids", "discoverFids": true, "shardIds": [0, 1, 2], "include": {"links": true, "userData": true}, "maxRecords": 500000 } → Build complete follow graph for network analysis Information Diffusion Analysis json { "mode": "byTime", "fids": [seed_users], "startTimestamp": 100000000, "stopTimestamp": 100500000, "include": {"casts": true, "reactions": true} } → Track how content spreads through the network over time ## Quick Start ### Basic Example: Backfill by FIDs json { "hubBaseUrl": "https://hub.pinata.cloud", "mode": "byFids", "fids": [2, 3, 6833], "include": { "casts": true, "reactions": true, "links": true, "userData": true }, "pageSize": 1000, "maxRecords": 10000 } ### Time-Bounded Study json { "hubBaseUrl": "https://hub.pinata.cloud", "mode": "byTime", "fids": [2, 3], "startTimestamp": 100000000, "stopTimestamp": 100050000, "include": { "casts": true, "reactions": true } } ### Real-Time Event Tail json { "hubBaseUrl": "https://hub.pinata.cloud", "mode": "tailEvents", "tail": { "fromEventId": "0", "shardIndex": 0 }, "maxRecords": 1000 } ### Auto-Discover FIDs via Shard Scan json { "hubBaseUrl": "https://hub.pinata.cloud", "mode": "byFids", "discoverFids": true, "shardIds": [0, 1], "include": { "casts": true, "userData": true }, "maxRecords": 5000 } ### With Frame/Mini-App Metadata Parsing json { "hubBaseUrl": "https://hub.pinata.cloud", "mode": "byFids", "fids": [2], "fetchEmbeds": true, "maxEmbedsPerRun": 100, "proxy": "RESIDENTIAL", "include": { "casts": true } } ## Input Configuration ### Required Fields | Field | Type | Description | Default | |-------|------|-------------|---------| | hubBaseUrl | string | HTTP endpoint of Farcaster Hub | https://hub.pinata.cloud | | mode | enum | Ingestion mode: byFids, byTime, tailEvents | byFids | ### Mode-Specific Fields #### By FIDs Mode | Field | Type | Description | Default | |-------|------|-------------|---------| | fids | array<integer> | List of Farcaster IDs to scrape | [] | | discoverFids | boolean | Auto-discover FIDs via shard scan | false | | shardIds | array<integer> | Shard IDs to scan when discovering | [] | #### By Time Mode | Field | Type | Description | Default | |-------|------|-------------|---------| | fids | array<integer> | FIDs to scrape (required) | [] | | startTimestamp | integer | Start time (Farcaster epoch seconds) | - | | stopTimestamp | integer | Stop time (Farcaster epoch seconds) | - | #### Tail Events Mode | Field | Type | Description | Default | |-------|------|-------------|---------| | tail.fromEventId | string | Start from event ID (empty = start from 0) | "0" | | tail.shardIndex | integer | Shard index to tail (optional) | - | ### Entity Filters | Field | Type | Description | Default | |-------|------|-------------|---------| | include.casts | boolean | Include cast messages | true | | include.reactions | boolean | Include reactions (likes/recasts) | true | | include.links | boolean | Include follows | true | | include.userData | boolean | Include user profiles | true | ### Optional Features | Field | Type | Description | Default | |-------|------|-------------|---------| | fetchEmbeds | boolean | Parse embedded URLs for Frames/Mini-Apps | false | | maxEmbedsPerRun | integer | Max embeds to fetch per run | 500 | | neynarApiKey | string | Neynar v2 API key (optional) | - | | clientApi | boolean | Enable Farcaster Client API (experimental) | false | | proxy | string | Apify Proxy groups or custom URL | - | ### Performance & Limits | Field | Type | Description | Default | |-------|------|-------------|---------| | pageSize | integer | Records per page (max 1000) | 1000 | | maxRecords | integer | Stop after N records (safety limit) | - | | requestPerMinute | integer | Rate limit for Hub API calls | 600 | ## Output Schema The actor produces normalized entities with the following types: ### Cast Entity json { "entity_type": "cast", "fid": 2, "hash": "0x1234567890abcdef", "ts": 123456789, "ts_iso": "2025-01-15T10:30:00.000Z", "text": "Hello Farcaster!", "mentions": [3, 6833], "parent": { "castId": { "fid": 2, "hash": "0xabc..." } }, "embeds": { "urls": ["https://example.com"], "castIds": [] }, "derived": { "urls": ["https://example.com"], "frame_meta": { "name": "My App", "url": "https://app.example.com" } }, "ingest_source": "hub_http", "ingest_ts": "2025-01-15T10:31:00.000Z", "raw": { /* original Hub message */ } } ### Reaction Entity json { "entity_type": "reaction", "fid": 3, "type": "like", "target": { "castId": { "fid": 2, "hash": "0x1234..." } }, "ts": 123456790, "ts_iso": "2025-01-15T10:31:00.000Z", "hash": "0xabcd...", "ingest_source": "hub_http", "ingest_ts": "2025-01-15T10:32:00.000Z", "raw": { /* original Hub message */ } } ### Link Entity (Follow) json { "entity_type": "link", "fid": 3, "targetFid": 2, "type": "follow", "ts": 123456791, "ts_iso": "2025-01-15T10:32:00.000Z", "hash": "0xdef...", "ingest_source": "hub_http", "ingest_ts": "2025-01-15T10:33:00.000Z", "raw": { /* original Hub message */ } } ### User Data Entity json { "entity_type": "user_data", "fid": 2, "username": "vitalik.eth", "display": "Vitalik", "pfp": "https://example.com/pfp.png", "bio": "Ethereum co-founder", "url": "https://vitalik.ca", "location": "Singapore", "github": "vbuterin", "twitter": "VitalikButerin", "ts": 123456792, "ts_iso": "2025-01-15T10:33:00.000Z", "ingest_source": "hub_http", "ingest_ts": "2025-01-15T10:34:00.000Z", "raw": [ /* original Hub messages */ ] } ### Event Entity (Tail Mode) json { "entity_type": "event", "event_id": "12345", "event_type": "MERGE_MESSAGE", "ts": 123456793, "ts_iso": "2025-01-15T10:34:00.000Z", "shard_index": 0, "message": { /* hydrated message if MERGE_MESSAGE */ }, "ingest_source": "hub_http", "ingest_ts": "2025-01-15T10:35:00.000Z", "raw": { /* original Hub event */ } } ## Farcaster Timestamps Important: Farcaster uses a custom epoch starting at 2021-01-01T00:00:00.000Z. - All entities include both ts (Farcaster epoch seconds) and ts_iso (ISO 8601) fields - Use ts_iso for human-readable timestamps and data analysis - Use ts for filtering Hub API requests Example conversion: - Farcaster epoch 100000000 = 2024-03-03T01:46:40.000Z - Current time: isoToFarcasterEpoch(new Date().toISOString()) ## Ingestion Modes Explained ### Mode 1: By FIDs (Deterministic Backfill) Use Case: Research specific users, backfill known accounts How it works: 1. For each FID in the input list (or discovered via shard scan): - Fetch all casts with pagination - Fetch all reactions (likes/recasts) - Fetch all follows - Fetch user profile data 2. Maintains checkpoint per FID (lastTs, lastPageToken) for resumable runs 3. Optionally discover FIDs by scanning specified shards Best for: User-centric analysis, follower studies, content backfills ### Mode 2: By Time Window (Targeted Study) Use Case: Time-bounded analysis (e.g., "all activity during an event") How it works: 1. For each FID, fetch only messages within startTimestamp to stopTimestamp 2. Applies time filters to casts (Hub native support) 3. Filters reactions and links manually (Hub doesn't support time filters) 4. Faster than full backfill when studying specific time periods Best for: Event analysis, temporal studies, A/B testing ### Mode 3: Tail Events (Near-Real-Time) Use Case: Live monitoring, incremental ingestion How it works: 1. Poll /v1/events starting from fromEventId (or last checkpoint) 2. For MERGE_MESSAGE events, hydrate and push the message entity 3. Update lastEventId checkpoint per shard 4. Sleeps 5s between polls (configurable) Important: Hubs prune events older than ~3 days. Run frequently (every 1-2 days) to avoid data loss. Best for: Real-time dashboards, notifications, streaming pipelines ## Optional Features ### Frame/Mini-App Metadata Parsing When fetchEmbeds: true, the actor will: 1. Extract all unique URLs from cast embeds 2. Fetch each URL (up to maxEmbedsPerRun limit) 3. Parse fc:miniapp:* and fc:frame:* meta tags 4. Enrich cast entities with derived.frame_meta object Use Proxy: Set proxy field to avoid rate limits (e.g., "RESIDENTIAL" for Apify Proxy) Performance: Adds ~2-5s per URL. Use maxEmbedsPerRun to cap crawling time. ### Neynar v2 Integration Provide neynarApiKey to use Neynar's hosted Hub endpoints instead of direct Hub HTTP. Benefits: - Faster, managed infrastructure - No self-hosted Hub required - Additional features (v2 only; v1 EOL March 31, 2025) Records flagged: All entities get ingest_source: "neynar_v2" ### Client API (Experimental) Set clientApi: true to enable Warpcast-specific endpoints (e.g., trending, channels). Warning: Non-protocol data. Records flagged as ingest_source: "client_api" to avoid confusion. ## State Checkpointing & Resumability The actor automatically persists state every 30 seconds and on Apify migration events: - Per-FID checkpoints: { lastTs, lastPageToken } for resuming mid-pagination - Per-Shard checkpoints: { lastEventId } for event tail mode - Migration-safe: Survives container restarts and platform migrations To resume a run: 1. Start the actor with same input 2. State is automatically restored 3. Scraping continues from last checkpoint ## Performance Tips 1. Use time filters: Narrow startTimestamp/stopTimestamp for faster runs 2. Batch FIDs: Process related users together to share dedup cache 3. Tune pageSize: Larger pages (1000) = fewer requests, but slower per-request 4. Set maxRecords: Safety limit prevents runaway costs 5. Monitor rate limits: Default 600 req/min is conservative; increase if Hub allows 6. Schedule tail runs: Run every 1-2 days to avoid event pruning ## Limitations & Best Practices ### Hub Event Pruning - Limitation: Hubs prune events older than ~3 days - Best Practice: Schedule tail runs every 1-2 days for continuous ingestion ### Reaction/Link Time Filters - Limitation: Hub API doesn't support time filters for reactions/links - Workaround: Actor fetches all and filters manually in byTime mode (slower) ### Embed Fetching - Limitation: Some URLs may be slow, dead, or behind auth - Best Practice: Use maxEmbedsPerRun cap and Apify Proxy to avoid timeouts ### Rate Limiting - Default: 600 req/min (conservative) - Tuning: Increase requestPerMinute if your Hub supports higher rates - Public Hubs: May have stricter limits; monitor 429 responses ## Pricing & Compute Approximate compute units (based on default settings): | Run Type | Records | Compute Units | Notes | |----------|---------|---------------|-------| | Small backfill | <10k | ~0.01 | 2-3 FIDs, no embeds | | Medium backfill | 100k | ~0.5 | 10-20 FIDs, all entities | | Large backfill | 1M | ~5 | 100+ FIDs or full shard scan | | Tail (1 hour) | 1k events | ~0.005 | Near-real-time streaming | | With embeds | +100 URLs | +0.02 per 100 | Crawlee overhead | Formula: ~0.5 CU per 100k records (without embeds) ## Example Use Cases ### Social Graph Analysis json { "mode": "byFids", "fids": [2, 3, 6833, 5650], "include": { "links": true, "userData": true } } Output: Follow relationships + user profiles for network analysis ### Content Research json { "mode": "byTime", "fids": [2], "startTimestamp": 100000000, "stopTimestamp": 100050000, "include": { "casts": true, "reactions": true } } Output: All casts + reactions during a specific event ### Real-Time Dashboard json { "mode": "tailEvents", "tail": { "fromEventId": "0" }, "maxRecords": 10000 } Output: Live stream of all protocol events (schedule every hour) ### Frame/Mini-App Catalog json { "mode": "byFids", "fids": [2, 3], "fetchEmbeds": true, "maxEmbedsPerRun": 200, "include": { "casts": true } } Output: Casts with Frame/Mini-App metadata extracted ## Troubleshooting ### "Failed to connect to Hub" - Verify hubBaseUrl is correct and accessible - Check Hub is running and serving HTTP API on port 3381 - Try public Hub: https://hub.pinata.cloud ### "No data returned" - Verify FIDs exist and have activity - Check time window isn't too narrow (byTime mode) - Ensure include.* filters aren't excluding all data ### "Max records limit reached" - Increase maxRecords or remove limit for full backfill - Use checkpointing to resume in multiple runs ### "Rate limit errors (429)" - Decrease requestPerMinute - Add delays between runs - Use Neynar hosted Hub (better rate limits) ### "Event tail missing data" - Events pruned >3 days ago - Schedule runs more frequently (every 1-2 days) - Use byFids mode for historical backfill ## Data Views The actor provides pre-configured dataset views: 1. Overview: All entities with key identifiers 2. Casts: Cast content, timestamps, and URLs 3. Reactions: Likes and recasts by FID 4. Follows: Follow relationships (social graph edges) 5. Users: User profiles and metadata Access views in Apify Console → Dataset → Views tab ## Support - Email: kontakt@barrierefix.de - Documentation: Farcaster Hub API Docs - Issues: Report bugs or request features via email ## Version History - 1.0.0 (2025-01) - Initial release - Three ingestion modes (byFids, byTime, tailEvents) - Hub HTTP API integration - State checkpointing - Optional Frame/Mini-App parsing - Neynar v2 support --- ## 🔗 Explore More of Our Actors ### 📰 Content & Publishing | Actor | Description | |-------|-------------| | Notion Marketplace Scraper | Scrape Notion templates and marketplace listings | | Ghost Newsletter Scraper | Extract Ghost newsletter content and subscriber data | | Google Play Reviews Scraper | Extract app reviews from Google Play Store | ### 💬 Social Media & Community | Actor | Description | |-------|-------------| | Reddit Scraper Pro | Monitor subreddits and track keywords with sentiment analysis | | Discord Scraper Pro | Extract Discord messages and chat history for community insights | | YouTube Comments Harvester | Comprehensive YouTube comments scraper with channel-wide enumeration | | YouTube Contact Scraper | Extract YouTube channel contact information for outreach | | YouTube Shorts Scraper | Scrape YouTube Shorts for viral content research | --- ## License MIT License - Free for commercial and non-commercial use ## Legal Disclaimer / Rechtlicher Hinweis EN: This actor is a general-purpose tool for analyzing publicly accessible web data. The user bears sole responsibility for ensuring their specific use complies with: - Applicable laws (GDPR/DSGVO, copyright law) - The target website's Terms of Service - Apify's Terms of Service The provider (barrierefix) expressly disclaims liability for any unauthorized or unlawful use. By using this actor, the user agrees to indemnify the provider against any third-party claims arising from their use of the data. DE: Dieser Actor ist ein allgemeines Werkzeug zur Analyse öffentlich zugänglicher Webdaten. Der Nutzer trägt die alleinige Verantwortung dafür, dass seine spezifische Nutzung den geltenden Gesetzen (DSGVO, Urheberrecht), den Nutzungsbedingungen der Zielwebsite und den Apify-Nutzungsbedingungen entspricht. Der Anbieter (barrierefix) schließt jegliche Haftung für unbefugte oder rechtswidrige Nutzung ausdrücklich aus. Mit der Nutzung dieses Actors erklärt sich der Nutzer bereit, den Anbieter von allen Ansprüchen Dritter freizustellen, die aus seiner Datennutzung entstehen. --- This tool is not affiliated with Farcaster. All trademarks belong to their respective owners.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Farcaster Hub Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
barrierefix
Pricing
Paid
Total Runs
60
Active Users
4
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support