WordPress Posts Scraper - Extract Articles & Metadata
by devnaz
Extract posts, articles, and metadata from any WordPress site using REST API. 20+ filters: date ranges, categories, tags, 0authors, search keywords. G...
Opens on Apify.com
About WordPress Posts Scraper - Extract Articles & Metadata
Extract posts, articles, and metadata from any WordPress site using REST API. 20+ filters: date ranges, categories, tags, 0authors, search keywords. Get title, content, author bio, featured images & more. No WordPress account needed. Fast, reliable data extraction for content aggregation & research.
What does this actor do?
WordPress Posts Scraper - Extract Articles & Metadata is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
WordPress Posts Scraper The WordPress Posts Scraper is an Apify actor that extracts posts and metadata from any WordPress website using the WordPress REST API. It automatically handles pagination and fetches additional information like author details, categories, tags, and featured images. This actor is perfect for researchers, content aggregators, and developers who need structured data from WordPress sites. ## How It Works 1. You provide one or more WordPress site URLs. 2. The actor checks if the WordPress REST API is available. 3. It fetches posts with your specified filters (dates, categories, keywords, etc.). 4. Handles pagination automatically until all posts are retrieved. 5. Extracts metadata such as author name, categories, tags, and featured images. 6. Returns structured JSON output with all relevant post details. ## Features ✅ Fetches posts from any WordPress site using REST API ✅ Supports pagination until all posts are retrieved ✅ 20+ advanced filters: date ranges, categories, tags, author, search keywords, status, and more ✅ Extracts metadata like author bio, categories, tags, and featured images ✅ Configurable sorting (by date, modified, title, author, relevance) ✅ Optional proxy support (not required for most sites) ✅ Clean and structured JSON output ✅ No WordPress account required ## Getting Started ### 1. Input Parameters To use the scraper, provide the following inputs: | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | startUrls | Array | ✅ | List of WordPress site URLs to scrape (e.g., [{"url": "https://techcrunch.com"}]) | | maxPosts | Integer | ❌ | Maximum total posts to extract per site (default: 5, max: 10000) | | perPage | Integer | ❌ | [Advanced] Posts per API request (default: 50, max: 100). Higher = fewer requests = lower cost. Reduce to 10-20 if timeouts occur. | | searchKeyword | String | ❌ | Filter posts by keyword search | | after | String | ❌ | Posts published after this date (ISO8601: 2025-01-01T00:00:00) | | before | String | ❌ | Posts published before this date (ISO8601: 2025-12-31T23:59:59) | | modifiedAfter | String | ❌ | Posts modified after this date (ISO8601) | | modifiedBefore | String | ❌ | Posts modified before this date (ISO8601) | | categories | Array | ❌ | Filter by category IDs (e.g., ["1", "5", "12"]) | | categoriesExclude | Array | ❌ | Exclude specific category IDs | | tags | Array | ❌ | Filter by tag IDs | | tagsExclude | Array | ❌ | Exclude specific tag IDs | | author | Array | ❌ | Filter by author IDs | | authorExclude | Array | ❌ | Exclude specific author IDs | | status | String | ❌ | Post status: publish, draft, pending, private, future (default: publish) | | orderBy | String | ❌ | Sort by: date, modified, title, author, id, relevance (default: date) | | order | String | ❌ | Sort order: asc or desc (default: desc) | | sticky | Boolean | ❌ | Include only sticky posts (default: false) | | slug | String | ❌ | Filter by specific post slug | | offset | Integer | ❌ | Skip a specific number of posts (default: 0) | | proxyConfiguration | Object | ❌ | Proxy settings (optional - not needed for most WordPress sites) | ### 2. Running the Actor #### Using Apify Interface 1. Navigate to the actor's Apify page. 2. Enter the required parameters. 3. Click Run and wait for the data to be scraped. #### Using Apify API bash curl -X POST -H "Content-Type: application/json" \ -d '{ "startUrls": [{"url": "https://techcrunch.com"}], "maxPosts": 50, "after": "2025-01-01T00:00:00", "orderBy": "date", "order": "desc" }' \ "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_API_TOKEN" ## Output Format The output is a JSON dataset containing structured post details: json [ { "id": 19263, "date": "2025-11-04T15:34:27", "modified": "2025-11-04T16:08:02", "slug": "wordpress-6-9-beta-3", "link": "https://wordpress.org/news/2025/11/wordpress-6-9-beta-3/", "title": "WordPress 6.9 Beta 3", "content": "<p>WordPress 6.9 Beta 3 is available for download and testing!</p>...", "excerpt": "<p>WordPress 6.9 Beta 3 is available for download and testing!...</p>", "author": "Amy Kamala", "categories": ["Development", "General", "Releases"], "tags": ["6.9", "development", "release"], "featured_image": "https://wordpress.org/wp-content/uploads/featured.jpg", "extra_metadata": { "author_bio": "Full Stack Dev, Artist, Masters from UCLA", "author_url": "https://kittenkamala.com/", "category_description": "Development news and updates" } } ] ## Use Cases - Content Aggregation – Collect and analyze posts from different WordPress sites. - SEO Research – Extract content and metadata for SEO analysis. - Data Science – Gather datasets for NLP or sentiment analysis. - Backup and Archiving – Store blog content for future reference. - Competitor Monitoring – Track competitor blog posts and content strategies. - Research & Analysis – Extract posts by date range, category, or keyword for academic or business research. ## Performance & Cost Optimization ### Speed & Reliability - Speed: ~2-5 seconds per 50 posts (using REST API) - Success rate: 99%+ on WordPress sites with REST API enabled - Concurrency: Supports multiple sites simultaneously - No proxy required: WordPress REST API is public and doesn't require proxies in most cases ### Cost Optimization with perPage Parameter The perPage parameter controls how many posts are fetched per API request, directly impacting cost and speed: Example: Extracting 100 posts | perPage | API Requests | Compute Units | Speed | Notes | |---------|--------------|---------------|-------|-------| | 10 | 10 requests | Higher cost | Slower | Use if large sites timeout | | 50 (default) | 2 requests | Lower cost | Faster | Recommended - best balance | | 100 | 1 request | Lowest cost | Fastest | May timeout on large sites (TechCrunch, etc.) | Recommendation: - Default (50): Works on most sites, good balance between cost and reliability - Large sites (TechCrunch, Wired, etc.): If timeouts occur, reduce to perPage: 20-30 - Small sites: Increase to perPage: 100 for maximum speed and lowest cost ## Notes - WordPress REST API required: This actor only works with sites that have the WordPress REST API enabled (enabled by default on most WordPress sites). - API not available?: If a site has disabled the REST API, the actor will return an error message. - Category/Tag IDs: To filter by categories or tags, you need the numeric IDs (not names). You can find these in the WordPress admin or via the API endpoints: - Categories: https://yoursite.com/wp-json/wp/v2/categories - Tags: https://yoursite.com/wp-json/wp/v2/tags - Date format: Use ISO8601 format for date filters (e.g., 2025-01-01T00:00:00) ## Support & Troubleshooting Having issues? Check these common solutions: 1. Timeout errors (large sites like TechCrunch): Reduce the perPage parameter to 20-30. This makes more API requests but prevents timeouts. 2. WordPress REST API not available: The site may have disabled the REST API. Verify by visiting https://yoursite.com/wp-json/wp/v2/posts in your browser. 3. No posts returned: Check your filters - they may be too restrictive (e.g., date range with no matching posts). 4. Missing author data: Some WordPress sites may not include author information in the _embedded response. 5. Category/Tag filtering not working: Ensure you're using numeric IDs, not names. 6. High costs: Increase perPage to 80-100 for small/fast sites to reduce API requests and compute units. For bugs or feature requests, feel free to contact support. Happy scraping! 🚀 --- No WordPress account or subscription required. Get started analyzing WordPress content today!
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try WordPress Posts Scraper - Extract Articles & Metadata now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- devnaz
- Pricing
- Paid
- Total Runs
- 54
- Active Users
- 9
Related Actors
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Linkedin Profile Details Scraper + EMAIL (No Cookies Required)
by apimaestro
Twitter (X.com) Scraper Unlimited: No Limits
by apidojo
Content Checker
by jakubbalada
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support