OSINT Scraper
by epctex
Automatically find leaked data and keywords on Pastebin, GitHub Gist, and other paste sites. Specify your search terms and get OSINT data delivered, ready for analysis.
Opens on Apify.com
About OSINT Scraper
Ever need to find what's been accidentally left out in the open? This OSINT Scraper is my go-to for exactly that. It digs through public code and text pastes on sites like Pastebin, GitHub Gist, Ideone, Dumpz, Pasteorg, and Textbin. You just tell it what to look for—specific keywords, project names, API keys, or other sensitive strings—and it fetches the relevant snippets for you. No setup fuss; it works right out of the box. I use it for security research to find leaked credentials, for monitoring my own company's data, or just to see what information is floating around on a particular topic. It automates a tedious manual search process, saving hours. You get clean, actionable data back, which is perfect for initial reconnaissance or building a bigger dataset. It’s straightforward, does one job well, and fits right into an automation workflow without being overcomplicated.
What does this actor do?
OSINT Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
OSINT Scraper
Overview
An Apify actor that scrapes multiple public paste and code-sharing websites for potentially sensitive data based on your search terms. It's designed for Open Source Intelligence (OSINT) gathering.
Key Features
- Keyword Search: Scrape content using any custom keywords.
- Multi-Site Support: Targets sites like Pastebin, GitHub Gist, Codepad, Ideone, Paste.org, and Textbin.
- Modular: Enable or disable scraping for specific websites via input flags.
- Extensible: Allows custom data extraction via a JavaScript function.
- Efficient: Optimized for speed with low compute unit consumption (~0.01-0.03 units per 100 pages).
How to Use
The actor requires a JSON input configuration. You must use a proxy; you can use your own or Apify Proxy.
Tip: To scrape only specific sites, set their corresponding flags to true in the input. For Pastebin, US-based proxies are recommended due to regional restrictions.
Input
Required input is a JSON object. The searchKeywords array and proxy configuration are mandatory.
{
"searchKeywords": ["@gmail", "db_pass"],
"codepad": true,
"githubgist": true,
"ideone": true,
"pastebin": true,
"pasteorg": true,
"textbin": true,
"proxy": {
"useApifyProxy": true
},
"extendOutputFunction": "($) => { return {'customField': $('title').text()} }"
}
Input Fields
searchKeywords: (Required) Array of strings containing keywords to search for.proxy: (Required) Proxy configuration object.codepad,githubgist,ideone,pastebin,pasteorg,textbin: (Optional) Boolean flags to enable/disable scraping for each specific website.extendOutputFunction: (Optional) A string containing a JavaScript function for custom data extraction. The function receives a JQuery handle ($) as an argument.
Output
Results are stored in the Apify dataset. Each item represents a found match and has the following structure:
{
"keyword": "a",
"url": "https://gist.github.com/trin94/3381395adc8b2c3fea81a38b9a385369"
}
You can manage results using the Apify API in Python, PHP, Node.js, or other languages. See the Apify API reference for details.
Development & Support
This actor is under active development. For bug reports or feature requests, create an issue on the GitHub repository. For more information, visit epctex.com.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try OSINT Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- epctex
- Pricing
- Paid
- Total Runs
- 3,594
- Active Users
- 847
Related Actors
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Linkedin Profile Details Scraper + EMAIL (No Cookies Required)
by apimaestro
Twitter (X.com) Scraper Unlimited: No Limits
by apidojo
Content Checker
by jakubbalada
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support