WordPress Posts Scraper - Extract Articles & Metadata

WordPress Posts Scraper - Extract Articles & Metadata

by devnaz

Extract posts, articles, and metadata from any WordPress site using REST API. 20+ filters: date ranges, categories, tags, 0authors, search keywords. G...

54 runs

9 users

Opens on Apify.com

About WordPress Posts Scraper - Extract Articles & Metadata

Extract posts, articles, and metadata from any WordPress site using REST API. 20+ filters: date ranges, categories, tags, 0authors, search keywords. Get title, content, author bio, featured images & more. No WordPress account needed. Fast, reliable data extraction for content aggregation & research.

What does this actor do?

WordPress Posts Scraper - Extract Articles & Metadata is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

WordPress Posts Scraper The WordPress Posts Scraper is an Apify actor that extracts posts and metadata from any WordPress website using the WordPress REST API. It automatically handles pagination and fetches additional information like author details, categories, tags, and featured images. This actor is perfect for researchers, content aggregators, and developers who need structured data from WordPress sites. ## How It Works 1. You provide one or more WordPress site URLs. 2. The actor checks if the WordPress REST API is available. 3. It fetches posts with your specified filters (dates, categories, keywords, etc.). 4. Handles pagination automatically until all posts are retrieved. 5. Extracts metadata such as author name, categories, tags, and featured images. 6. Returns structured JSON output with all relevant post details. ## Features ✅ Fetches posts from any WordPress site using REST API ✅ Supports pagination until all posts are retrieved ✅ 20+ advanced filters: date ranges, categories, tags, author, search keywords, status, and more ✅ Extracts metadata like author bio, categories, tags, and featured images ✅ Configurable sorting (by date, modified, title, author, relevance) ✅ Optional proxy support (not required for most sites) ✅ Clean and structured JSON output ✅ No WordPress account required ## Getting Started ### 1. Input Parameters To use the scraper, provide the following inputs: | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `startUrls` | Array | ✅ | List of WordPress site URLs to scrape (e.g., `[{"url": "https://techcrunch.com"}]`) | | `maxPosts` | Integer | ❌ | Maximum total posts to extract per site (default: 5, max: 10000) | | `perPage` | Integer | ❌ | [Advanced] Posts per API request (default: 50, max: 100). Higher = fewer requests = lower cost. Reduce to 10-20 if timeouts occur. | | `searchKeyword` | String | ❌ | Filter posts by keyword search | | `after` | String | ❌ | Posts published after this date (ISO8601: `2025-01-01T00:00:00`) | | `before` | String | ❌ | Posts published before this date (ISO8601: `2025-12-31T23:59:59`) | | `modifiedAfter` | String | ❌ | Posts modified after this date (ISO8601) | | `modifiedBefore` | String | ❌ | Posts modified before this date (ISO8601) | | `categories` | Array | ❌ | Filter by category IDs (e.g., `["1", "5", "12"]`) | | `categoriesExclude` | Array | ❌ | Exclude specific category IDs | | `tags` | Array | ❌ | Filter by tag IDs | | `tagsExclude` | Array | ❌ | Exclude specific tag IDs | | `author` | Array | ❌ | Filter by author IDs | | `authorExclude` | Array | ❌ | Exclude specific author IDs | | `status` | String | ❌ | Post status: `publish`, `draft`, `pending`, `private`, `future` (default: `publish`) | | `orderBy` | String | ❌ | Sort by: `date`, `modified`, `title`, `author`, `id`, `relevance` (default: `date`) | | `order` | String | ❌ | Sort order: `asc` or `desc` (default: `desc`) | | `sticky` | Boolean | ❌ | Include only sticky posts (default: false) | | `slug` | String | ❌ | Filter by specific post slug | | `offset` | Integer | ❌ | Skip a specific number of posts (default: 0) | | `proxyConfiguration` | Object | ❌ | Proxy settings (optional - not needed for most WordPress sites) | ### 2. Running the Actor #### Using Apify Interface 1. Navigate to the actor's Apify page. 2. Enter the required parameters. 3. Click Run and wait for the data to be scraped. #### Using Apify API `bash curl -X POST -H "Content-Type: application/json" \ -d '{ "startUrls": [{"url": "https://techcrunch.com"}], "maxPosts": 50, "after": "2025-01-01T00:00:00", "orderBy": "date", "order": "desc" }' \ "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_API_TOKEN"` ## Output Format The output is a JSON dataset containing structured post details: json [ { "id": 19263, "date": "2025-11-04T15:34:27", "modified": "2025-11-04T16:08:02", "slug": "wordpress-6-9-beta-3", "link": "https://wordpress.org/news/2025/11/wordpress-6-9-beta-3/", "title": "WordPress 6.9 Beta 3", "content": "<p>WordPress 6.9 Beta 3 is available for download and testing!</p>...", "excerpt": "<p>WordPress 6.9 Beta 3 is available for download and testing!...</p>", "author": "Amy Kamala", "categories": ["Development", "General", "Releases"], "tags": ["6.9", "development", "release"], "featured_image": "https://wordpress.org/wp-content/uploads/featured.jpg", "extra_metadata": { "author_bio": "Full Stack Dev, Artist, Masters from UCLA", "author_url": "https://kittenkamala.com/", "category_description": "Development news and updates" } } ] ## Use Cases - Content Aggregation – Collect and analyze posts from different WordPress sites. - SEO Research – Extract content and metadata for SEO analysis. - Data Science – Gather datasets for NLP or sentiment analysis. - Backup and Archiving – Store blog content for future reference. - Competitor Monitoring – Track competitor blog posts and content strategies. - Research & Analysis – Extract posts by date range, category, or keyword for academic or business research. ## Performance & Cost Optimization ### Speed & Reliability - Speed: ~2-5 seconds per 50 posts (using REST API) - Success rate: 99%+ on WordPress sites with REST API enabled - Concurrency: Supports multiple sites simultaneously - No proxy required: WordPress REST API is public and doesn't require proxies in most cases ### Cost Optimization with `perPage` Parameter The `perPage` parameter controls how many posts are fetched per API request, directly impacting cost and speed: Example: Extracting 100 posts | perPage | API Requests | Compute Units | Speed | Notes | |---------|--------------|---------------|-------|-------| | 10 | 10 requests | Higher cost | Slower | Use if large sites timeout | | 50 (default) | 2 requests | Lower cost | Faster | Recommended - best balance | | 100 | 1 request | Lowest cost | Fastest | May timeout on large sites (TechCrunch, etc.) | Recommendation: - Default (50): Works on most sites, good balance between cost and reliability - Large sites (TechCrunch, Wired, etc.): If timeouts occur, reduce to `perPage: 20-30` - Small sites: Increase to `perPage: 100` for maximum speed and lowest cost ## Notes - WordPress REST API required: This actor only works with sites that have the WordPress REST API enabled (enabled by default on most WordPress sites). - API not available?: If a site has disabled the REST API, the actor will return an error message. - Category/Tag IDs: To filter by categories or tags, you need the numeric IDs (not names). You can find these in the WordPress admin or via the API endpoints: - Categories: `https://yoursite.com/wp-json/wp/v2/categories` - Tags: `https://yoursite.com/wp-json/wp/v2/tags` - Date format: Use ISO8601 format for date filters (e.g., `2025-01-01T00:00:00`) ## Support & Troubleshooting Having issues? Check these common solutions: 1. Timeout errors (large sites like TechCrunch): Reduce the `perPage` parameter to 20-30. This makes more API requests but prevents timeouts. 2. WordPress REST API not available: The site may have disabled the REST API. Verify by visiting `https://yoursite.com/wp-json/wp/v2/posts` in your browser. 3. No posts returned: Check your filters - they may be too restrictive (e.g., date range with no matching posts). 4. Missing author data: Some WordPress sites may not include author information in the `_embedded` response. 5. Category/Tag filtering not working: Ensure you're using numeric IDs, not names. 6. High costs: Increase `perPage` to 80-100 for small/fast sites to reduce API requests and compute units. For bugs or feature requests, feel free to contact support. Happy scraping! 🚀 --- No WordPress account or subscription required. Get started analyzing WordPress content today!

Categories

AUTOMATION SEO_TOOLS NEWS

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try WordPress Posts Scraper - Extract Articles & Metadata now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: devnaz
Pricing: Paid
Total Runs: 54
Active Users: 9

Related Actors

Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.

Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.

by invideoiq

Linkedin Profile Details Scraper + EMAIL (No Cookies Required)

by apimaestro

Twitter (X.com) Scraper Unlimited: No Limits

by apidojo

Content Checker

Content Checker

by jakubbalada

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support