My Actor

My Actor

by mellow_fuel

My Actor is a flexible web scraper for AI, e-commerce, and social media data. It's powerful, so use it carefully and ethically.

4 runs
2 users
Try This Actor

Opens on Apify.com

About My Actor

Need to pull data from AI platforms, e-commerce sites, or social media? My Actor is the scraper I keep coming back to. It just works. I've used it to gather training data from AI tools, track competitor pricing, and monitor social trends—all from a single setup. The key is its flexibility; it handles the structure of these very different sites without needing me to rewrite everything each time. A quick heads-up, though: because it's so effective, you should always use it carefully. Respect robots.txt files, check a site's terms of service, and implement polite crawling delays. Do that, and this becomes an incredibly reliable part of your data pipeline. It saves me hours of manual work each week, letting me focus on analyzing the data instead of fighting to collect it.

What does this actor do?

My Actor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

My Actor

A web scraping Actor built with Crawlee and CheerioCrawler, designed to extract data from websites. It's categorized for AI, E-commerce, and Social Media use cases.

Overview

This Actor is a JavaScript template that uses Cheerio to parse HTML and scrape structured data from web pages. It starts from a list of URLs, crawls pages, extracts information (like page titles and URLs), and saves the results to a dataset. It's built on the Apify SDK and is suitable for scraping static HTML content.

Key Features

  • CheerioCrawler: Uses the fast Cheerio library for parsing HTML, ideal for static websites.
  • Structured Data Output: Saves scraped items into an Apify dataset with a defined schema.
  • Configurable Input: Control the crawl via startUrls and maxPagesPerCrawl parameters.
  • Proxy Support: Includes configuration for proxy rotation to help avoid blocks.
  • Local & Cloud Development: Full local development workflow with easy deployment to the Apify platform.

How to Use

Local Development & Run

  1. Install dependencies: npm install
  2. Start the Actor locally: apify run
  3. Deploy to Apify Console:
    bash apify login apify push

Project Structure

.actor/
├── actor.json          # Actor configuration and settings
├── input_schema.json   # Defines and validates Actor input
└── dataset_schema.json # Defines the structure of output data
src/
└── main.js            # Main Actor entry point and logic

How It Works

  1. The crawler begins with URLs from the input's startUrls.
  2. It fetches each page and uses Cheerio to parse the HTML.
  3. The requestHandler function extracts data (e.g., page title and URL) from each page.
  4. Extracted items are saved to the dataset and logged.
  5. The crawl respects the maxPagesPerCrawl limit set in the input.

Input / Output

Input (via input_schema.json):
* startUrls (array): List of URLs to start crawling from.
* maxPagesPerCrawl (number, optional): Limit for the total number of pages to scrape.

Output:
* Dataset: Contains saved items, typically including url and title for each scraped page, following the structure in dataset_schema.json.

Included Tools & Resources

Helpful Links:
* Quick Start for building Actors.
* Video tutorial on building a scraper with CheerioCrawler.
* Written tutorial on using CheerioCrawler.
* Web scraping with Cheerio in 2023
* How to create Actors from templates

Getting Started for Local Development

To pull an existing Actor from the Apify Console for local work:
1. Install the Apify CLI:
* Homebrew: brew install apify-cli
* NPM: npm -g install apify-cli
2. Pull the Actor using the CLI.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try My Actor now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
mellow_fuel
Pricing
Paid
Total Runs
4
Active Users
2
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support