My Actor
by david15999
An open-source HTML scraper for developers. Use it as a reliable foundation to extract data from any website for research, monitoring, or building datasets.
Opens on Apify.com
About My Actor
Need to pull clean data from any website? This open-source HTML scraper is the straightforward tool I keep coming back to. It’s built to handle the messy reality of web scraping—different page structures, dynamic content, and all. You give it a URL and some configuration, and it fetches the raw HTML for you to parse and extract exactly what you need. It’s perfect for developers who want a reliable, no-fuss foundation for their data projects without being locked into a specific data extraction service. I’ve used it for everything from monitoring competitor prices and gathering research data to building datasets for machine learning. Because it’s open-source, you can inspect the code, tweak it for your specific case, and even contribute improvements. It runs reliably on the Apify platform, handling things like proxy rotation and request queues so you can focus on the data. If you're comfortable with tools like Cheerio or Beautiful Soup and need a dependable scraper to feed them, this actor is a great starting point.
What does this actor do?
My Actor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
My Actor
A JavaScript (Node.js) template for scraping data from a single web page. You provide a URL via the input, and the actor fetches the page, parses it, and stores the extracted data in an Apify dataset. The template is pre-configured to extract page headings but is designed to be easily modified for any scraping task.
Key Features
- Apify SDK: The core toolkit for building and running the actor.
- Input Schema: A defined schema for validating the actor's input (primarily the target URL).
- Structured Storage: Output is saved to an Apify Dataset for easy access and export.
- Axios Client: Used for reliable HTTP requests to fetch page HTML.
- Cheerio: A fast, jQuery-like library for parsing and extracting data from HTML.
Input / Output
Input: The actor expects an input object containing the url of the page to scrape, as defined by its input schema.
Output: The scraped data is stored as individual items in the actor's default dataset. The default template stores an array of page headings (h1 through h6), but you will modify this to match your needs.
How to Use
Basic Operation
- Provide the target page URL in the actor's input.
- Run the actor. It will:
- Fetch the page HTML using
axios.get(url). - Load the HTML into Cheerio for parsing (
cheerio.load(response.data)). - Execute the extraction logic (by default, selecting all heading elements).
- Save the results to the dataset via
Actor.pushData().
- Fetch the page HTML using
Customization
The main scraping logic is in the Cheerio parsing step. To scrape different data, edit the selector and data extraction code. For example, the default code is:
$("h1, h2, h3, h4, h5, h6").each((_i, element) => {...});
Change the selector (e.g., $(".product-name")) and the extracted properties within the loop to match your target data.
Local Development
To modify the actor locally, use the Apify CLI to pull the source code:
-
Install the Apify CLI:
bash npm -g install apify-cli
or
bash brew install apify-cli -
Pull the actor using its unique name or ID (found in the Apify console):
bash apify pull <ActorId>
Resources
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try My Actor now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- david15999
- Pricing
- Paid
- Total Runs
- 766
- Active Users
- 17
Related Actors
Similarweb scraper
by curious_coder
Google Ads Scraper
by silva95gustavo
Cheap Google Search Results Scraper
by tuningsearch
G2 Explorer
by jupri
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support