Instagram Keyword Search Scraper

Name: Instagram Keyword Search Scraper
Author: crawlerbros

by crawlerbros

Extract posts from Instagram keyword search results. Scrape post URLs, captions, usernames, media URLs, hashtags, engagement metrics, and more. Suppor...

222 runs

82 users

Try This Actor

Opens on Apify.com

About Instagram Keyword Search Scraper

Extract posts from Instagram keyword search results. Scrape post URLs, captions, usernames, media URLs, hashtags, engagement metrics, and more. Supports multiple keywords with anti-detection features.

What does this actor do?

Instagram Keyword Search Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Instagram Keyword Search Scraper Extract posts from Instagram keyword search results with this powerful Apify actor. Search for any keywords and scrape post URLs, captions, usernames, media URLs, hashtags, engagement metrics, and more. ## Features - Keyword-Based Search: Search Instagram for any keywords or phrases - Multiple Keywords: Process multiple keywords in a single run - Comprehensive Data Extraction: Extract post IDs, URLs, captions, usernames, media URLs, hashtags, mentions, and media types - Guaranteed Username Extraction: Automatic fallback to individual post pages ensures 100% username availability - Anti-Detection: Built-in human behavior simulation to avoid rate limiting - Infinite Scroll: Automatically scrolls to load more results - Deduplication: Automatically removes duplicate posts - Flexible Configuration: Customize delays, post limits, and behavior settings - Cookie Authentication: Support for authenticated sessions to access search results (required) - Session Management: Save and reuse cookies between runs - Multiple Extraction Methods: Tries JSON/GraphQL extraction first, falls back to HTML parsing ## Authentication (Required) Instagram requires authentication to access keyword search results. You must provide cookies from an active Instagram session. ### How to Extract Instagram Cookies Method 1: Using Browser DevTools (Recommended) 1. Open Instagram in your browser and log in 2. Press `F12` to open Developer Tools 3. Go to the Application tab (Chrome) or Storage tab (Firefox) 4. Click on Cookies → `https://www.instagram.com` 5. Find and copy these important cookies: - `sessionid` (most important) - `ds_user_id` - `csrftoken` 6. Format them as JSON: `json [ { "name": "sessionid", "value": "YOUR_SESSION_ID_VALUE", "domain": ".instagram.com", "path": "/", "secure": true, "httpOnly": true }, { "name": "ds_user_id", "value": "YOUR_USER_ID", "domain": ".instagram.com", "path": "/", "secure": true }, { "name": "csrftoken", "value": "YOUR_CSRF_TOKEN", "domain": ".instagram.com", "path": "/", "secure": true } ]` Method 2: Using EditThisCookie Extension (Easiest!) 1. Install EditThisCookie for Chrome 2. Log in to Instagram 3. Click the EditThisCookie icon 4. Click "Export" button (bottom right) 5. Paste the entire JSON directly into the `cookies` input field The scraper automatically converts browser cookie formats! No need to manually clean or reformat - just paste the raw export. Method 3: Using Cookie-Editor Extension 1. Install Cookie-Editor for Chrome/Firefox 2. Log in to Instagram 3. Click the extension icon 4. Click "Export" → "JSON" 5. Copy and paste into the `cookies` field ### Security Notes - Never share your cookies - they provide full access to your Instagram account - Use a dedicated Instagram account for scraping (not your personal account) - Cookies expire after some time - you'll need to refresh them periodically - Store cookies securely and don't commit them to version control ## Input Configuration The scraper accepts the following input parameters: | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `keywords` | Array | Yes | - | List of keywords or phrases to search for | | `maxPosts` | Integer | No | 20 | Maximum number of posts to extract per keyword (1-500) | | `minDelayBetweenRequests` | Integer | No | 2 | Minimum delay in seconds between actions (1-30) | | `maxDelayBetweenRequests` | Integer | No | 5 | Maximum delay in seconds between actions (1-60) | | `humanizeBehavior` | Boolean | No | true | Enable human-like behavior simulation | | `cookies` | String | Highly Recommended | - | Instagram cookies in JSON format (required for search access) | | `sessionName` | String | No | "default_session" | Session name for saving/loading cookies between runs | ### Example Input `json { "keywords": [ "living in dubai", "travel photography", "food recipes" ], "maxPosts": 50, "minDelayBetweenRequests": 2, "maxDelayBetweenRequests": 5, "humanizeBehavior": true, "cookies": "[{\"name\":\"sessionid\",\"value\":\"YOUR_SESSION_ID\",\"domain\":\".instagram.com\",\"path\":\"/\",\"secure\":true,\"httpOnly\":true}]", "sessionName": "my_instagram_session" }` ## Output Format The scraper outputs a dataset with one row per post. Each post contains: json { "post_id": "DBq4D_QIlEH", "post_url": "https://www.instagram.com/p/DBq4D_QIlEH/", "username": "travel_photographer", "user_url": "https://www.instagram.com/travel_photographer/", "caption": "Amazing sunset at the beach! #travel #photography @friend_username", "posted_date": null, "location": null, "media_type": "image", "media_count": 1, "thumbnail_url": "https://scontent.cdninstagram.com/v/t39.30808-6/...", "media_urls": ["https://scontent.cdninstagram.com/v/t39.30808-6/..."], "hashtags": ["travel", "photography"], "mentions": ["friend_username"], "likes_count": 0, "comments_count": 0, "views_count": 0, "is_ad": false, "is_carousel": false, "search_keyword": "travel", "scraped_at": "2025-11-21T12:28:25.052408", "source": "instagram_keyword_search" } ### Output Fields | Field | Type | Availability | Description | |-------|------|--------------|-------------| | `post_id` | String | ✅ Always | Instagram post shortcode/ID | | `post_url` | String | ✅ Always | Full URL to the post | | `username` | String | ✅ Always | Username of the post author (fetched with fallback) | | `user_url` | String | ✅ Always | URL to the user's profile | | `caption` | String | ✅ Usually | Post caption/text (when available) | | `media_type` | String | ✅ Always | Type: "image", "video", or "carousel" | | `media_count` | Integer | ✅ Always | Number of media items (1 for single posts) | | `thumbnail_url` | String | ✅ Always | URL to post thumbnail image | | `media_urls` | Array | ✅ Always | List of media URLs (contains at least thumbnail) | | `hashtags` | Array | ✅ Always | List of hashtags used in caption (empty if none) | | `mentions` | Array | ✅ Always | List of mentioned usernames (empty if none) | | `is_carousel` | Boolean | ✅ Always | Whether the post contains multiple media items | | `search_keyword` | String | ✅ Always | The keyword used to find this post | | `scraped_at` | String | ✅ Always | ISO timestamp when data was scraped | | `source` | String | ✅ Always | Data source identifier | | `posted_date` | String | ⚠️ Limited | ISO timestamp when post was created | | `location` | String | ⚠️ Limited | Location tag (if available) | | `likes_count` | Integer | ⚠️ Limited | Number of likes | | `comments_count` | Integer | ⚠️ Limited | Number of comments | | `views_count` | Integer | ⚠️ Limited | Number of views (videos only) | | `is_ad` | Boolean | ⚠️ Limited | Whether the post is an advertisement | *Limited Availability: These fields are often not available in Instagram's keyword search results. Instagram intentionally restricts access to engagement metrics, post dates, and location data in search results to prevent scraping. These fields may return `null` or `0` values. To access this data reliably, you would need to: - Use Instagram's official Graph API (requires business account and API approval) - Navigate to individual post pages (slower and may trigger rate limits) - Access Instagram while logged in and parse dynamically loaded data (unreliable) Note: The scraper automatically attempts to fill missing usernames by visiting individual post pages as a fallback, ensuring usernames are available for all posts. ## Use Cases - Market Research: Analyze trending topics and popular content - Competitor Analysis: Monitor competitor activity and engagement - Content Discovery: Find inspiration for your own content - Brand Monitoring: Track mentions and hashtag usage - Influencer Research: Discover influencers in specific niches - Trend Analysis: Identify emerging trends and popular topics ## Anti-Detection Features The scraper includes several anti-detection measures: - Human Behavior Simulation: Random mouse movements and scrolling - Random Delays: Configurable delays between actions - Stealth Mode: Browser fingerprint masking - User-Agent Rotation: Realistic browser identification - Rate Limit Handling: Automatic detection and response to blocks ## Rate Limiting Instagram has rate limits to prevent scraping. To minimize the risk: - Use reasonable delays (2-5 seconds recommended) - Enable `humanizeBehavior` option - Don't request too many posts at once - Spread your scraping over time - Monitor for "Action Blocked" warnings ## Technical Details - Browser: Firefox with Playwright - Language: Python 3.12 - Dependencies: Apify SDK, Playwright, BeautifulSoup4 - Architecture: Async/await pattern for efficient I/O ## Local Development ### Prerequisites - Python 3.12+ - Apify CLI (optional) ### Installation `bash # Install dependencies pip install -r requirements.txt # Install Playwright browsers playwright install firefox playwright install-deps firefox` ### Running Locally `bash # Run the scraper python -m src` ### Input Configuration (Local) Create `storage/key_value_stores/default/INPUT.json`: `json { "keywords": ["test keyword"], "maxPosts": 10, "humanizeBehavior": true }` ## Limitations Data Availability: - Engagement metrics (likes, comments, views) are not available in keyword search results - Post dates and locations are typically not included in search result HTML - These limitations are due to Instagram's intentional restrictions to prevent scraping - Use Instagram's official Graph API for reliable access to engagement data Other Limitations: - Instagram's search results are limited by their algorithm - Posts from private accounts are not accessible - Rate limiting may occur with excessive requests - Instagram may change their page structure, requiring updates - Cookies expire periodically and need to be refreshed What IS Available: - ✅ Post IDs and URLs - ✅ Usernames (with automatic fallback extraction) - ✅ Captions, hashtags, and mentions - ✅ Media URLs and thumbnails - ✅ Media types (image, video, carousel) ## Support For issues, questions, or feature requests: 1. Check the logs for error messages 2. Verify your input configuration 3. Ensure keywords are valid and not empty 4. Try reducing `maxPosts` if encountering rate limits ## Version History ### 1.0 (2025-11-21) - Initial release - Keyword-based search support - Multiple extraction methods (JSON + HTML) - Anti-detection features - Comprehensive data extraction ## License This actor is provided as-is for educational and research purposes. Users are responsible for complying with Instagram's Terms of Service and robots.txt file. --- Note: Web scraping may be subject to legal restrictions in your jurisdiction. Always ensure you have the right to scrape data and comply with the website's terms of service.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Instagram Keyword Search Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: crawlerbros
Pricing: Paid
Total Runs: 222
Active Users: 82

Related Actors

🏯 Tweet Scraper V2 - X / Twitter Scraper

by apidojo

Instagram Scraper

by apify

TikTok Scraper

by clockworks

Instagram Profile Scraper

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support