My Actor
by mellow_fuel
My Actor is a flexible web scraper for AI, e-commerce, and social media data. It's powerful, so use it carefully and ethically.
Opens on Apify.com
About My Actor
Need to pull data from AI platforms, e-commerce sites, or social media? My Actor is the scraper I keep coming back to. It just works. I've used it to gather training data from AI tools, track competitor pricing, and monitor social trends—all from a single setup. The key is its flexibility; it handles the structure of these very different sites without needing me to rewrite everything each time. A quick heads-up, though: because it's so effective, you should always use it carefully. Respect robots.txt files, check a site's terms of service, and implement polite crawling delays. Do that, and this becomes an incredibly reliable part of your data pipeline. It saves me hours of manual work each week, letting me focus on analyzing the data instead of fighting to collect it.
What does this actor do?
My Actor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
My Actor
A web scraping Actor built with Crawlee and CheerioCrawler, designed to extract data from websites. It's categorized for AI, E-commerce, and Social Media use cases.
Overview
This Actor is a JavaScript template that uses Cheerio to parse HTML and scrape structured data from web pages. It starts from a list of URLs, crawls pages, extracts information (like page titles and URLs), and saves the results to a dataset. It's built on the Apify SDK and is suitable for scraping static HTML content.
Key Features
- CheerioCrawler: Uses the fast Cheerio library for parsing HTML, ideal for static websites.
- Structured Data Output: Saves scraped items into an Apify dataset with a defined schema.
- Configurable Input: Control the crawl via
startUrlsandmaxPagesPerCrawlparameters. - Proxy Support: Includes configuration for proxy rotation to help avoid blocks.
- Local & Cloud Development: Full local development workflow with easy deployment to the Apify platform.
How to Use
Local Development & Run
- Install dependencies:
npm install - Start the Actor locally:
apify run - Deploy to Apify Console:
bash apify login apify push
Project Structure
.actor/
├── actor.json # Actor configuration and settings
├── input_schema.json # Defines and validates Actor input
└── dataset_schema.json # Defines the structure of output data
src/
└── main.js # Main Actor entry point and logic
How It Works
- The crawler begins with URLs from the input's
startUrls. - It fetches each page and uses Cheerio to parse the HTML.
- The
requestHandlerfunction extracts data (e.g., page title and URL) from each page. - Extracted items are saved to the dataset and logged.
- The crawl respects the
maxPagesPerCrawllimit set in the input.
Input / Output
Input (via input_schema.json):
* startUrls (array): List of URLs to start crawling from.
* maxPagesPerCrawl (number, optional): Limit for the total number of pages to scrape.
Output:
* Dataset: Contains saved items, typically including url and title for each scraped page, following the structure in dataset_schema.json.
Included Tools & Resources
- Apify SDK - Toolkit for building Actors.
- Crawlee - Web scraping and browser automation library.
- Cheerio - Library for parsing and manipulating HTML/XML.
- Input schema - For defining and validating Actor input.
- Dataset - For storing structured output data.
- Proxy configuration - Support for IP rotation.
Helpful Links:
* Quick Start for building Actors.
* Video tutorial on building a scraper with CheerioCrawler.
* Written tutorial on using CheerioCrawler.
* Web scraping with Cheerio in 2023
* How to create Actors from templates
Getting Started for Local Development
To pull an existing Actor from the Apify Console for local work:
1. Install the Apify CLI:
* Homebrew: brew install apify-cli
* NPM: npm -g install apify-cli
2. Pull the Actor using the CLI.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try My Actor now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- mellow_fuel
- Pricing
- Paid
- Total Runs
- 4
- Active Users
- 2
Related Actors
Google Search Results Scraper
by apify
Website Content Crawler
by apify
🔥 Leads Generator - $3/1k 50k leads like Apollo
by microworlds
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support