Your Better Instagram Scraper

Your Better Instagram Scraper

by austinmyc

Robust scraping: - Support Posts, Comments, Replies - Will add support for Reels and Profile-based scraping

86 runs
40 users
Try This Actor

Opens on Apify.com

About Your Better Instagram Scraper

Robust scraping: - Support Posts, Comments, Replies - Will add support for Reels and Profile-based scraping

What does this actor do?

Your Better Instagram Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Instagram Hashtag Scraper ### Developed by: @hvgupta and @austinmyc ## ⚡ What does this Actor do? This Actor searches Instagram hashtags and extracts: - Posts from hashtag pages - Comments on each post - Replies to comments - User information (username, content, timestamps) - Hierarchical data structure showing relationships between posts, comments, and replies All data is filtered by your specified date range to get only relevant content. ## 📋 Input Parameters | Parameter | Type | Required | Description | Default | |-----------|------|----------|-------------|---------| | keyword | String | ✅ | Hashtag keyword to search (without #) | - | | hashtag_limit | Integer | ❌ | Number of hashtags to explore | 1 | | post_limit | Integer | ❌ | Posts to scrape per hashtag | 2 | | comment_limit | Integer | ❌ | Comments to scrape per post | 10 | ## 📊 Output Format The Actor returns structured JSON data with the following format: json { "user_id": "username", "datetime": "2024-06-24T10:30:00+08:00", "content": "Post or comment content", "post_id": "unique_post_identifier", "url": "https://instagram.com/p/...", "category": "Original Post", "current_hash": "abc123def456...", "parent_hash": "parent_content_hash_if_applicable" } Category Types: - "Original Post" - Main hashtag posts - "Comment" - Comments on posts - "Reply" - Replies to comments Hash Fields (for hierarchical structure): - current_hash - hash for deduplication and identification - parent_hash - Hash of the parent These hash fields enable the hierarchical relationship structure: - Original Posts: parent_hash is null - Comments: parent_hash references the original post's current_hash - Replies: parent_hash references the parent comment's current_hash ## ⚙️ How it works 1. Search: Finds hashtags related to your keyword 2. Content Extraction: - Opens each hashtag page - Extracts posts within your date range - Collects comments and replies for each post - Handles Instagram's navigation 3. Data Processing: Structures data hierarchically with relationships 4. Output: Returns structured JSON data ## 📈 Performance & Features - Smart Navigation: Handles Instagram's dynamic content loading - Hierarchical Structure: Maintains relationships between posts, comments, and replies - Memory Efficient: Processes data incrementally to handle large datasets ## 📝 Example Usage ### Using Apify SDK (Python) python from apify_client import ApifyClient # Initialize the ApifyClient with your API token client = ApifyClient("YOUR_APIFY_TOKEN") # Prepare the Actor input run_input = { "keyword": "travel", "hashtag_limit": 2, "post_limit": 5, "comment_limit": 15 } # Run the Actor and wait for it to finish run = client.actor("YOUR_ACTOR_ID").call(run_input=run_input) # Fetch and print Actor results from the run's dataset scraped_output_list = client.dataset(run["defaultDatasetId"]).list_items() for single_scraped_output in scraped_output_list.items(): print(single_scraped_output) ### Using Apify SDK (JavaScript) js import { ApifyApi } from 'apify-client'; // Initialize the ApifyApi with your API token const client = new ApifyApi({ token: 'YOUR_APIFY_TOKEN', }); // Prepare the Actor input const input = { keyword: "travel", hashtag_limit: 2, post_limit: 5, comment_limit: 15 }; // Run the Actor and wait for it to finish const run = await client.actor('YOUR_ACTOR_ID').call(input); // Fetch and log Actor results from the run's dataset const { items } = await client.dataset(run.defaultDatasetId).listItems(); items.forEach((item) => { console.dir(item); }); ### Basic JSON Input json { "keyword": "travel", "hashtag_limit": 2, "post_limit": 5, "comment_limit": 15 } inputs to the keyword should not contain any " "s ## 🔍 Data Quality Features - Automatic Deduplication: Uses content hashing to prevent duplicate entries - Input Validation: Validates date ranges and parameter constraints - Comprehensive Logging: Detailed logs for monitoring and debugging - Data Integrity: Maintains accurate relationships between posts, comments, and replies - Time Zone Handling: Properly handles Instagram's timestamp formats ## 📊 Output Structure The Actor returns a flat array of JSON objects, but maintains hierarchical relationships through the previously mentioned hash linking: Example Relationship: json [ { "current_hash": "post123", "parent_hash": null, "category": "Original Post", "content": "Amazing sunset today! #travel", "user_id": "traveler_jane", "datetime": "2024-06-24T10:30:00+08:00", "post_id": "p/ABC123", "url": "https://instagram.com/p/ABC123" }, { "current_hash": "comment456", "parent_hash": "post123", "category": "Comment", "content": "Beautiful photo!", "user_id": "photo_lover", "datetime": "2024-06-24T11:15:00+08:00", "post_id": "p/ABC123/comment456", "url": "https://instagram.com/p/ABC123" }, { "current_hash": "reply789", "parent_hash": "comment456", "category": "Reply", "content": "I agree, stunning!", "user_id": "nature_fan", "datetime": "2024-06-24T12:00:00+08:00", "post_id": "p/ABC123/comment456/reply789", "url": "https://instagram.com/p/ABC123" } ]

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Your Better Instagram Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
austinmyc
Pricing
Paid
Total Runs
86
Active Users
40
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support