OpenAlex Scraper

Name: OpenAlex Scraper
Author: shahidirfan

by shahidirfan

Extract scholarly data from OpenAlex—titles, authors, institutions, venues, concepts—using this fast Apify actor. Get academic research in bulk via AP...

31 runs

5 users

Try This Actor

Opens on Apify.com

About OpenAlex Scraper

Extract scholarly data from OpenAlex—titles, authors, institutions, venues, concepts—using this fast Apify actor. Get academic research in bulk via API, and export results as CSV, Excel, or HTML datasets for research, analytics, or discovery.

What does this actor do?

OpenAlex Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

OpenAlex Scraper Extract comprehensive academic data from OpenAlex, the largest open database of scholarly works, authors, institutions, venues, and concepts. This powerful scraper enables researchers, analysts, and developers to access millions of records for bibliometric analysis, literature reviews, and data-driven insights. ## 🚀 Key Features - Multi-Entity Support: Scrape works, authors, institutions, venues, and concepts from OpenAlex - Advanced Search & Filtering: Use powerful search queries with custom filters and sorting options - High-Volume Data Collection: Retrieve thousands of records with automatic pagination - Rate Limit Optimization: Polite pool access for maximum API throughput (up to 100,000 requests/day) - Automatic Error Handling: Built-in retries and rate limit management - Structured Data Output: Clean, consistent JSON output ready for analysis ## 📊 What You Can Scrape - Works: Research papers, articles, books with full metadata, abstracts, and citations - Authors: Researcher profiles with publication counts and institutional affiliations - Institutions: University and research organization data with country information - Venues: Journals, conferences, and publishers with impact metrics - Concepts: Research topics and keywords with hierarchical relationships ## 🔧 Input Configuration Configure your scraping job with these parameters: | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | `search` | string | Search query (title, author, institution name, etc.) | "" | | `entity` | select | Entity type to scrape | "works" | | `results_wanted` | integer | Maximum results to collect | 100 | | `max_pages` | integer | Maximum API pages to fetch | 10 | | `email` | string | Email for polite pool (higher rate limits) | "" | | `filters` | object | Additional API filters | {} | | `sort` | string | Sort order | "relevance_score:desc" | ### Entity Options - `works` - Scholarly publications - `authors` - Researcher profiles - `institutions` - Academic organizations - `venues` - Publication outlets - `concepts` - Research topics ### Example Filters `json { "publication_year": "2023", "cited_by_count": ">100", "country_code": "US" }` ## 📤 Output Data Structure ### Works Entity Example json { "id": "https://openalex.org/W123456789", "title": "Machine Learning in Healthcare: A Comprehensive Review", "authors": ["Dr. Jane Smith", "Prof. John Doe"], "institutions": ["Harvard University", "MIT"], "publication_year": 2023, "doi": "10.1234/health-ml-2023", "url": "https://openalex.org/W123456789", "abstract": "This paper explores the applications of machine learning...", "concepts": ["Machine Learning", "Healthcare", "Artificial Intelligence"], "cited_by_count": 245, "type": "journal-article", "source": "openalex.org" } ### Authors Entity Example `json { "id": "https://openalex.org/A123456789", "display_name": "Dr. Jane Smith", "works_count": 87, "cited_by_count": 1250, "last_known_institution": "Harvard University", "orcid": "0000-0001-2345-6789", "source": "openalex.org" }` ## 🎯 Usage Examples ### Basic Research Paper Search `json { "search": "machine learning healthcare", "entity": "works", "results_wanted": 500, "email": "your-email@example.com" }` ### Top Cited Authors in AI `json { "entity": "authors", "search": "artificial intelligence", "sort": "cited_by_count:desc", "results_wanted": 100, "filters": { "works_count": ">50" } }` ### University Research Output `json { "entity": "institutions", "search": "Stanford University", "results_wanted": 1, "email": "researcher@university.edu" }` ### Trending Research Topics `json { "entity": "concepts", "sort": "works_count:desc", "results_wanted": 50, "filters": { "level": "1" } }` ## ⚙️ Advanced Configuration ### Optimizing for Large Datasets - Use email parameter for polite pool access - Set appropriate `max_pages` to control API usage - Apply filters to narrow results before pagination ### Rate Limiting - Free tier: 100,000 requests/day - Polite pool (with email): Higher priority access - Automatic handling of rate limits with retry logic ### Data Filtering Tips - Use `publication_year` for time-based analysis - Filter by `cited_by_count` for impact studies - Country codes for geographical research - Concept IDs for topic-specific queries ## 📈 Use Cases - Bibliometric Analysis: Track citation patterns and research impact - Literature Reviews: Systematic collection of papers on specific topics - Researcher Profiling: Build comprehensive author databases - Institutional Rankings: Compare research output across organizations - Trend Analysis: Identify emerging research areas and concepts - Academic Network Mapping: Discover collaborations and affiliations ## 🔍 API Integration This scraper uses the official OpenAlex REST API: - Base URL: `https://api.openalex.org` - Documentation: OpenAlex API Guide - Rate Limits: 100,000 requests/day per user - No authentication required (email optional for polite pool) ## 📋 Limits & Considerations - Rate Limits: 100,000 API calls per day (higher with polite pool) - Result Limits: Up to 10,000 results per entity type - Data Freshness: OpenAlex updates data regularly - Data Coverage: Over 200 million works, 15 million authors, 100,000 institutions ## 🤝 Contributing Found a bug or have a feature request? Open an issue on our GitHub repository. ## 📄 License This project is licensed under the MIT License - see the LICENSE file for details. --- Keywords: OpenAlex scraper, academic data extraction, scholarly works API, bibliometric data, research papers scraper, author profiles, institution data, academic analytics, citation analysis, research trends

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try OpenAlex Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: shahidirfan
Pricing: Paid
Total Runs: 31
Active Users: 5

Related Actors

Web Scraper

by apify

Cheerio Scraper

by apify

Website Content Crawler

by apify

Legacy PhantomJS Crawler

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

OpenAlex Scraper

About OpenAlex Scraper

What does this actor do?

Key Features

How to Use

Documentation

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?