Article Content Extractor 📄

Name: Article Content Extractor 📄
Author: easyapi

by easyapi

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with titl...

1,700 runs

76 users

Try This Actor

Opens on Apify.com

About Article Content Extractor 📄

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with title, description, content, author, publish date and more. 🔍📄

What does this actor do?

Article Content Extractor 📄 is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Article Content Extractor 📄 Extract clean article content and metadata from any web pages automatically. This actor helps you get structured content from news sites, blogs, and other article-based websites. ## Features ✨ - Extract article content and metadata from any URL - Support batch processing of multiple URLs - Clean and structured JSON output - Built-in rate limiting to avoid overloading target sites - Robust error handling and validation - Fast and efficient processing ## Output Data Structure 📊 The actor extracts the following information from each article: - Title - Description - Main content (both HTML and plain text) - Author - Publication date - Source domain - Featured image URL - Related links - Tags - Scraping timestamp ## Use Cases 💡 - Content aggregation and syndication - News monitoring and analysis - Research and data collection - Content migration - SEO analysis - Digital archiving ## Limitations ⚠️ - Respects robots.txt and implements polite scraping - 2-second delay between requests to avoid overwhelming target servers - URLs must be valid and accessible - Content extraction quality depends on page structure ## Tips for Best Results 💪 1. Provide valid, accessible URLs 2. Use for public content only 3. Consider target website's terms of service 4. Monitor execution logs for any issues Need help or have questions? Feel free to reach out! ### Input Example A full explanation of an input example in JSON. `json { "urls": [ "https://cleartax.in/s/gst-hsn-lookup", "https://www.fancode.com/pickleball/schedule" ] }` ### Output sample The results will be wrapped into a dataset which you can always find in the Storage tab. Here's an excerpt from the data you'd get if you apply the input parameters above: And here is the same data but in JSON. You can choose in which format to download your data: JSON, JSONL, Excel spreadsheet, HTML table, CSV, or XML. json [ { "url": "https://www.fancode.com/pickleball/schedule", "title": "Pickleball Schedule - Check International and Domestic matches on FanCode", "description": "ABOUT FANCODEIndia's Premium Live Streaming, Live Scores & Sports Merchandise Shopping platform FanCode has grown to become one of the most loved and followed all-sports destination in the last few years....", "content": "<div><p><label>ABOUT FANCODE</label><label>India's Premium Live Streaming, Live Scores & Sports Merchandise Shopping platform FanCode has grown to become one of the most loved and followed all-sports destination in the last few years. The FanCode app has been downloaded by more than 3+ crore users. It offers interactive live streaming of all major sporting events, premier cricket tournaments, women's cricket, live football, basketball, baseball, wrestling, badminton, and other major sports. It also offer real-time match highlights, match videos, cricket videos, India cricket highlights, highlights of today's match, highlights of yesterday's match, cricket data, statistics, cricket analysis, fantasy insights, cricket updates, breaking news from India cricket and world of sports. It also offers sports merchandise for all major sporting leagues and teams from across the world.</label></p></div>", "author": "", "publishedDate": "", "source": "fancode.com", "image": "https://www.fancode.com/skillup-uploads/fc-web/home-page-new-arc/hero-image/v1/hero-image-dweb-v4.png", "links": [ "https://www.fancode.com/pickleball/schedule" ], "tags": [], "scrapedAt": "2025-02-05T07:19:26.119Z" }, ... ] ## Related Actors - 📄 URL Metadata Crawler - Extract comprehensive metadata from web pages including meta tags, favicons, and Open Graph tags. - 🔍 Google News Scraper - Collect up to 5000 news articles with flexible search options and language support. - 📚 arXiv Search Scraper - Extract comprehensive research paper data including titles, authors, and abstracts. - 🔬 Nature Search Results Scraper - Extract research article data from Nature.com with detailed metadata. - 📚 Medium Posts Search Scraper - Get detailed information about articles, authors, and engagement metrics from Medium. - 📚 Substack Posts Scraper - Extract comprehensive post data including title, author, and publication details. - 🔍 PubMed Search Scraper - Scrape research papers and academic articles with comprehensive metadata. - 📄 WikiHow Article Scraper - Extract article titles, dates, views, and detailed step-by-step content. - 🔍 Cointelegraph Search Scraper - Extract comprehensive article data including titles, authors, and publish dates. - 📚 Medium User Posts Scraper - Extract detailed post data including engagement metrics and publication details. - 🎯 Keyword Discovery Tool - Discover new keyword ideas and uncover valuable search insights. - 🔍 Keyword Density Checker - Analyze webpage content to calculate keyword density and frequency. - 🔍 AI-powered Search - Transform search queries into structured, AI-powered summaries with references. - 📝 Text Summarization - Automatically generate concise summaries of documents while preserving original content. - 🌐 Website Content to Markdown for LLM Training - Transform web content into clean, LLM-ready Markdown format.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Article Content Extractor 📄 now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: easyapi
Pricing: Paid
Total Runs: 1,700
Active Users: 76

Related Actors

Web Scraper

by apify

Cheerio Scraper

by apify

Website Content Crawler

by apify

Legacy PhantomJS Crawler

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

Article Content Extractor 📄

About Article Content Extractor 📄

What does this actor do?

Key Features

How to Use

Documentation

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?