My Actor

by david15999

An open-source HTML scraper for developers. Use it as a reliable foundation to extract data from any website for research, monitoring, or building datasets.

766 runs
17 users
Try This Actor

Opens on Apify.com

About My Actor

Need to pull clean data from any website? This open-source HTML scraper is the straightforward tool I keep coming back to. It’s built to handle the messy reality of web scraping—different page structures, dynamic content, and all. You give it a URL and some configuration, and it fetches the raw HTML for you to parse and extract exactly what you need. It’s perfect for developers who want a reliable, no-fuss foundation for their data projects without being locked into a specific data extraction service. I’ve used it for everything from monitoring competitor prices and gathering research data to building datasets for machine learning. Because it’s open-source, you can inspect the code, tweak it for your specific case, and even contribute improvements. It runs reliably on the Apify platform, handling things like proxy rotation and request queues so you can focus on the data. If you're comfortable with tools like Cheerio or Beautiful Soup and need a dependable scraper to feed them, this actor is a great starting point.

What does this actor do?

My Actor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

My Actor

A JavaScript (Node.js) template for scraping data from a single web page. You provide a URL via the input, and the actor fetches the page, parses it, and stores the extracted data in an Apify dataset. The template is pre-configured to extract page headings but is designed to be easily modified for any scraping task.

Key Features

  • Apify SDK: The core toolkit for building and running the actor.
  • Input Schema: A defined schema for validating the actor's input (primarily the target URL).
  • Structured Storage: Output is saved to an Apify Dataset for easy access and export.
  • Axios Client: Used for reliable HTTP requests to fetch page HTML.
  • Cheerio: A fast, jQuery-like library for parsing and extracting data from HTML.

Input / Output

Input: The actor expects an input object containing the url of the page to scrape, as defined by its input schema.

Output: The scraped data is stored as individual items in the actor's default dataset. The default template stores an array of page headings (h1 through h6), but you will modify this to match your needs.

How to Use

Basic Operation

  1. Provide the target page URL in the actor's input.
  2. Run the actor. It will:
    • Fetch the page HTML using axios.get(url).
    • Load the HTML into Cheerio for parsing (cheerio.load(response.data)).
    • Execute the extraction logic (by default, selecting all heading elements).
    • Save the results to the dataset via Actor.pushData().

Customization

The main scraping logic is in the Cheerio parsing step. To scrape different data, edit the selector and data extraction code. For example, the default code is:

$("h1, h2, h3, h4, h5, h6").each((_i, element) => {...});

Change the selector (e.g., $(".product-name")) and the extracted properties within the loop to match your target data.

Local Development

To modify the actor locally, use the Apify CLI to pull the source code:

  1. Install the Apify CLI:
    bash npm -g install apify-cli
    or
    bash brew install apify-cli

  2. Pull the actor using its unique name or ID (found in the Apify console):
    bash apify pull <ActorId>

Resources

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try My Actor now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
david15999
Pricing
Paid
Total Runs
766
Active Users
17
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support