My Actor

Name: My Actor
Author: mellow_fuel

by mellow_fuel

My Actor is a flexible web scraper for AI, e-commerce, and social media data. It's powerful, so use it carefully and ethically.

4 runs

2 users

Try This Actor

Opens on Apify.com

About My Actor

Need to pull data from AI platforms, e-commerce sites, or social media? My Actor is the scraper I keep coming back to. It just works. I've used it to gather training data from AI tools, track competitor pricing, and monitor social trends—all from a single setup. The key is its flexibility; it handles the structure of these very different sites without needing me to rewrite everything each time. A quick heads-up, though: because it's so effective, you should always use it carefully. Respect robots.txt files, check a site's terms of service, and implement polite crawling delays. Do that, and this becomes an incredibly reliable part of your data pipeline. It saves me hours of manual work each week, letting me focus on analyzing the data instead of fighting to collect it.

What does this actor do?

My Actor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

My Actor

A web scraping Actor built with Crawlee and CheerioCrawler, designed to extract data from websites. It's categorized for AI, E-commerce, and Social Media use cases.

Overview

This Actor is a JavaScript template that uses Cheerio to parse HTML and scrape structured data from web pages. It starts from a list of URLs, crawls pages, extracts information (like page titles and URLs), and saves the results to a dataset. It's built on the Apify SDK and is suitable for scraping static HTML content.

Key Features

CheerioCrawler: Uses the fast Cheerio library for parsing HTML, ideal for static websites.
Structured Data Output: Saves scraped items into an Apify dataset with a defined schema.
Configurable Input: Control the crawl via startUrls and maxPagesPerCrawl parameters.
Proxy Support: Includes configuration for proxy rotation to help avoid blocks.
Local & Cloud Development: Full local development workflow with easy deployment to the Apify platform.

How to Use

Local Development & Run

Install dependencies: npm install
Start the Actor locally: apify run
Deploy to Apify Console:
bash apify login apify push

Project Structure

.actor/
├── actor.json          # Actor configuration and settings
├── input_schema.json   # Defines and validates Actor input
└── dataset_schema.json # Defines the structure of output data
src/
└── main.js            # Main Actor entry point and logic

How It Works

The crawler begins with URLs from the input's startUrls.
It fetches each page and uses Cheerio to parse the HTML.
The requestHandler function extracts data (e.g., page title and URL) from each page.
Extracted items are saved to the dataset and logged.
The crawl respects the maxPagesPerCrawl limit set in the input.

Input / Output

Input (via input_schema.json):
* startUrls (array): List of URLs to start crawling from.
* maxPagesPerCrawl (number, optional): Limit for the total number of pages to scrape.

Output:
* Dataset: Contains saved items, typically including url and title for each scraped page, following the structure in dataset_schema.json.

Included Tools & Resources

Apify SDK - Toolkit for building Actors.
Crawlee - Web scraping and browser automation library.
Cheerio - Library for parsing and manipulating HTML/XML.
Input schema - For defining and validating Actor input.
Dataset - For storing structured output data.
Proxy configuration - Support for IP rotation.

Helpful Links:
* Quick Start for building Actors.
* Video tutorial on building a scraper with CheerioCrawler.
* Written tutorial on using CheerioCrawler.
* Web scraping with Cheerio in 2023
* How to create Actors from templates

Getting Started for Local Development

To pull an existing Actor from the Apify Console for local work:
1. Install the Apify CLI:
* Homebrew: brew install apify-cli
* NPM: npm -g install apify-cli
2. Pull the Actor using the CLI.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try My Actor now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: mellow_fuel
Pricing: Paid
Total Runs: 4
Active Users: 2

Related Actors

Google Search Results Scraper

by apify

Website Content Crawler

by apify

🔥 Leads Generator - $3/1k 50k leads like Apollo

by microworlds

Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.

by invideoiq

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

My Actor

About My Actor

What does this actor do?

Key Features

How to Use

Documentation

My Actor

Overview

Key Features

How to Use

Local Development & Run

Project Structure

How It Works

Input / Output

Included Tools & Resources

Getting Started for Local Development

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?