Cnn Top Headlines

Cnn Top Headlines

by runtime

An Apify actor that scrapes top headlines from CNN's homepage and articles. Get structured news data for aggregation, research, or feeds without managing your own scraper.

140 runs
6 users
Try This Actor

Opens on Apify.com

About Cnn Top Headlines

Need a reliable way to pull the latest news from CNN without dealing with rate limits or parsing complex HTML? This Apify actor is your go-to. It's a straightforward scraper I've used to consistently extract top headlines directly from CNN's homepage and individual article pages. You get clean, structured data—like headline text, article URLs, and timestamps—delivered in a format (JSON, CSV) that's ready to drop into your database, spreadsheet, or application. It runs on Apify's platform, so you can schedule it to run daily or hourly, ensuring your dataset is always current. I typically use it for building news aggregators, tracking media trends, or feeding a live news ticker on a website. It saves you the headache of maintaining your own scraper every time CNN tweaks its site layout. If you're in media monitoring, research, or just need a steady stream of headline data, this actor handles the heavy lifting.

What does this actor do?

Cnn Top Headlines is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

CNN Top Headlines Scraper

This Apify actor scrapes the latest top news headlines from CNN (https://www.cnn.com/) or CNN International (https://edition.cnn.com/). It can optionally extract full article details.

Key Features

  • Extracts real-time headlines from the CNN homepage or specific section pages.
  • Optionally follows links to scrape full article content, author, and publish date.
  • Outputs structured JSON data for easy processing or analysis.

How to Use

Configure your actor run using the input fields below. The main options are:

  • startUrls (array): The URLs to start scraping from (e.g., the homepage or a section like World or Business). Defaults to ["https://www.cnn.com/"].
  • maxHeadlines (integer): The maximum number of headlines to extract and visit. Default is 20.
  • includeArticleDetails (boolean): When set to true, the actor will visit each headline link and scrape the full article text, author, and published date. Default is false.

Example Input

{
  "startUrls": [
    { "url": "https://edition.cnn.com/" }
  ],
  "maxHeadlines": 10,
  "includeArticleDetails": true
}

Input & Output

Input Schema

Refer to .actor/input_schema.json for the complete specification.

Output Format

Results are saved to the default Apify dataset, available for download as JSON, CSV, or Excel.

Headline-only output:

{
  "title": "Superman smashes box office expectations",
  "url": "https://www.cnn.com/2025/07/13/entertainment/superman-box-office-intl",
  "source": "CNN",
  "scrapedAt": "2025-07-13T12:56:40.535Z"
}

Output with article details (when includeArticleDetails is true):

{
  "title": "Superman smashes box office expectations",
  "content": "Full article text ...",
  "author": "CNN Staff",
  "publishedDate": "2025-07-13T10:00:00Z",
  "url": "https://www.cnn.com/2025/07/13/entertainment/superman-box-office-intl",
  "source": "CNN",
  "scrapedAt": "2025-07-13T12:56:40.535Z"
}

How It Works

  1. The actor loads the provided startUrls.
  2. It extracts up to maxHeadlines news headlines from those pages.
  3. If includeArticleDetails is enabled, it navigates to each headline's URL to scrape the article body, author, and publication date.
  4. All results are stored in the dataset.

Important Notes

  • Terms of Service: Use this actor responsibly and in compliance with CNN's terms and robots.txt.
  • Rate Limiting: Avoid aggressive request rates to prevent overloading servers.
  • Proxies: Consider using proxies for large-scale scraping to avoid IP blocking.
  • Data Usage: Ensure you have the right to use scraped data for your intended purpose.
  • Scope: The actor only scrapes publicly accessible CNN articles.

Legal Disclaimer

This tool is for educational and research purposes. You are responsible for using it in a legal and responsible manner that does not harm CNN's infrastructure.

Related Actors

  • Booking.com Hotel Scraper: Scrape hotel data, prices, ratings, and more from Booking.com with advanced anti-detection and flexible extraction limits.
  • Product Hunt Scraper: Extract product listings, launch details, votes, and comments from Product Hunt for market research and trend analysis.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Cnn Top Headlines now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
runtime
Pricing
Paid
Total Runs
140
Active Users
6
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support