Your Better Instagram Scraper
by austinmyc
Robust scraping: - Support Posts, Comments, Replies - Will add support for Reels and Profile-based scraping
Opens on Apify.com
About Your Better Instagram Scraper
Robust scraping: - Support Posts, Comments, Replies - Will add support for Reels and Profile-based scraping
What does this actor do?
Your Better Instagram Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Instagram Hashtag Scraper ### Developed by: @hvgupta and @austinmyc ## ⚡ What does this Actor do? This Actor searches Instagram hashtags and extracts: - Posts from hashtag pages - Comments on each post - Replies to comments - User information (username, content, timestamps) - Hierarchical data structure showing relationships between posts, comments, and replies All data is filtered by your specified date range to get only relevant content. ## 📋 Input Parameters | Parameter | Type | Required | Description | Default | |-----------|------|----------|-------------|---------| | keyword | String | ✅ | Hashtag keyword to search (without #) | - | | hashtag_limit | Integer | ❌ | Number of hashtags to explore | 1 | | post_limit | Integer | ❌ | Posts to scrape per hashtag | 2 | | comment_limit | Integer | ❌ | Comments to scrape per post | 10 | ## 📊 Output Format The Actor returns structured JSON data with the following format: json { "user_id": "username", "datetime": "2024-06-24T10:30:00+08:00", "content": "Post or comment content", "post_id": "unique_post_identifier", "url": "https://instagram.com/p/...", "category": "Original Post", "current_hash": "abc123def456...", "parent_hash": "parent_content_hash_if_applicable" } Category Types: - "Original Post" - Main hashtag posts - "Comment" - Comments on posts - "Reply" - Replies to comments Hash Fields (for hierarchical structure): - current_hash - hash for deduplication and identification - parent_hash - Hash of the parent These hash fields enable the hierarchical relationship structure: - Original Posts: parent_hash is null - Comments: parent_hash references the original post's current_hash - Replies: parent_hash references the parent comment's current_hash ## ⚙️ How it works 1. Search: Finds hashtags related to your keyword 2. Content Extraction: - Opens each hashtag page - Extracts posts within your date range - Collects comments and replies for each post - Handles Instagram's navigation 3. Data Processing: Structures data hierarchically with relationships 4. Output: Returns structured JSON data ## 📈 Performance & Features - Smart Navigation: Handles Instagram's dynamic content loading - Hierarchical Structure: Maintains relationships between posts, comments, and replies - Memory Efficient: Processes data incrementally to handle large datasets ## 📝 Example Usage ### Using Apify SDK (Python) python from apify_client import ApifyClient # Initialize the ApifyClient with your API token client = ApifyClient("YOUR_APIFY_TOKEN") # Prepare the Actor input run_input = { "keyword": "travel", "hashtag_limit": 2, "post_limit": 5, "comment_limit": 15 } # Run the Actor and wait for it to finish run = client.actor("YOUR_ACTOR_ID").call(run_input=run_input) # Fetch and print Actor results from the run's dataset scraped_output_list = client.dataset(run["defaultDatasetId"]).list_items() for single_scraped_output in scraped_output_list.items(): print(single_scraped_output) ### Using Apify SDK (JavaScript) js import { ApifyApi } from 'apify-client'; // Initialize the ApifyApi with your API token const client = new ApifyApi({ token: 'YOUR_APIFY_TOKEN', }); // Prepare the Actor input const input = { keyword: "travel", hashtag_limit: 2, post_limit: 5, comment_limit: 15 }; // Run the Actor and wait for it to finish const run = await client.actor('YOUR_ACTOR_ID').call(input); // Fetch and log Actor results from the run's dataset const { items } = await client.dataset(run.defaultDatasetId).listItems(); items.forEach((item) => { console.dir(item); }); ### Basic JSON Input json { "keyword": "travel", "hashtag_limit": 2, "post_limit": 5, "comment_limit": 15 } inputs to the keyword should not contain any " "s ## 🔍 Data Quality Features - Automatic Deduplication: Uses content hashing to prevent duplicate entries - Input Validation: Validates date ranges and parameter constraints - Comprehensive Logging: Detailed logs for monitoring and debugging - Data Integrity: Maintains accurate relationships between posts, comments, and replies - Time Zone Handling: Properly handles Instagram's timestamp formats ## 📊 Output Structure The Actor returns a flat array of JSON objects, but maintains hierarchical relationships through the previously mentioned hash linking: Example Relationship: json [ { "current_hash": "post123", "parent_hash": null, "category": "Original Post", "content": "Amazing sunset today! #travel", "user_id": "traveler_jane", "datetime": "2024-06-24T10:30:00+08:00", "post_id": "p/ABC123", "url": "https://instagram.com/p/ABC123" }, { "current_hash": "comment456", "parent_hash": "post123", "category": "Comment", "content": "Beautiful photo!", "user_id": "photo_lover", "datetime": "2024-06-24T11:15:00+08:00", "post_id": "p/ABC123/comment456", "url": "https://instagram.com/p/ABC123" }, { "current_hash": "reply789", "parent_hash": "comment456", "category": "Reply", "content": "I agree, stunning!", "user_id": "nature_fan", "datetime": "2024-06-24T12:00:00+08:00", "post_id": "p/ABC123/comment456/reply789", "url": "https://instagram.com/p/ABC123" } ]
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Your Better Instagram Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- austinmyc
- Pricing
- Paid
- Total Runs
- 86
- Active Users
- 40
Related Actors
🏯 Tweet Scraper V2 - X / Twitter Scraper
by apidojo
Instagram Scraper
by apify
TikTok Scraper
by clockworks
Instagram Profile Scraper
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support