OSINT Scraper

OSINT Scraper

by epctex

Automatically find leaked data and keywords on Pastebin, GitHub Gist, and other paste sites. Specify your search terms and get OSINT data delivered, ready for analysis.

3,594 runs
847 users
Try This Actor

Opens on Apify.com

About OSINT Scraper

Ever need to find what's been accidentally left out in the open? This OSINT Scraper is my go-to for exactly that. It digs through public code and text pastes on sites like Pastebin, GitHub Gist, Ideone, Dumpz, Pasteorg, and Textbin. You just tell it what to look for—specific keywords, project names, API keys, or other sensitive strings—and it fetches the relevant snippets for you. No setup fuss; it works right out of the box. I use it for security research to find leaked credentials, for monitoring my own company's data, or just to see what information is floating around on a particular topic. It automates a tedious manual search process, saving hours. You get clean, actionable data back, which is perfect for initial reconnaissance or building a bigger dataset. It’s straightforward, does one job well, and fits right into an automation workflow without being overcomplicated.

What does this actor do?

OSINT Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

OSINT Scraper

Overview

An Apify actor that scrapes multiple public paste and code-sharing websites for potentially sensitive data based on your search terms. It's designed for Open Source Intelligence (OSINT) gathering.

Key Features

  • Keyword Search: Scrape content using any custom keywords.
  • Multi-Site Support: Targets sites like Pastebin, GitHub Gist, Codepad, Ideone, Paste.org, and Textbin.
  • Modular: Enable or disable scraping for specific websites via input flags.
  • Extensible: Allows custom data extraction via a JavaScript function.
  • Efficient: Optimized for speed with low compute unit consumption (~0.01-0.03 units per 100 pages).

How to Use

The actor requires a JSON input configuration. You must use a proxy; you can use your own or Apify Proxy.

Tip: To scrape only specific sites, set their corresponding flags to true in the input. For Pastebin, US-based proxies are recommended due to regional restrictions.

Input

Required input is a JSON object. The searchKeywords array and proxy configuration are mandatory.

{
  "searchKeywords": ["@gmail", "db_pass"],
  "codepad": true,
  "githubgist": true,
  "ideone": true,
  "pastebin": true,
  "pasteorg": true,
  "textbin": true,
  "proxy": {
    "useApifyProxy": true
  },
  "extendOutputFunction": "($) => { return {'customField': $('title').text()} }"
}

Input Fields

  • searchKeywords: (Required) Array of strings containing keywords to search for.
  • proxy: (Required) Proxy configuration object.
  • codepad, githubgist, ideone, pastebin, pasteorg, textbin: (Optional) Boolean flags to enable/disable scraping for each specific website.
  • extendOutputFunction: (Optional) A string containing a JavaScript function for custom data extraction. The function receives a JQuery handle ($) as an argument.

Output

Results are stored in the Apify dataset. Each item represents a found match and has the following structure:

{
  "keyword": "a",
  "url": "https://gist.github.com/trin94/3381395adc8b2c3fea81a38b9a385369"
}

You can manage results using the Apify API in Python, PHP, Node.js, or other languages. See the Apify API reference for details.

Development & Support

This actor is under active development. For bug reports or feature requests, create an issue on the GitHub repository. For more information, visit epctex.com.

Categories

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try OSINT Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
epctex
Pricing
Paid
Total Runs
3,594
Active Users
847
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support