OpenAlex Scraper

OpenAlex Scraper

by shahidirfan

Extract scholarly data from OpenAlex—titles, authors, institutions, venues, concepts—using this fast Apify actor. Get academic research in bulk via AP...

31 runs
5 users
Try This Actor

Opens on Apify.com

About OpenAlex Scraper

Extract scholarly data from OpenAlex—titles, authors, institutions, venues, concepts—using this fast Apify actor. Get academic research in bulk via API, and export results as CSV, Excel, or HTML datasets for research, analytics, or discovery.

What does this actor do?

OpenAlex Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

OpenAlex Scraper Extract comprehensive academic data from OpenAlex, the largest open database of scholarly works, authors, institutions, venues, and concepts. This powerful scraper enables researchers, analysts, and developers to access millions of records for bibliometric analysis, literature reviews, and data-driven insights. ## 🚀 Key Features - Multi-Entity Support: Scrape works, authors, institutions, venues, and concepts from OpenAlex - Advanced Search & Filtering: Use powerful search queries with custom filters and sorting options - High-Volume Data Collection: Retrieve thousands of records with automatic pagination - Rate Limit Optimization: Polite pool access for maximum API throughput (up to 100,000 requests/day) - Automatic Error Handling: Built-in retries and rate limit management - Structured Data Output: Clean, consistent JSON output ready for analysis ## 📊 What You Can Scrape - Works: Research papers, articles, books with full metadata, abstracts, and citations - Authors: Researcher profiles with publication counts and institutional affiliations - Institutions: University and research organization data with country information - Venues: Journals, conferences, and publishers with impact metrics - Concepts: Research topics and keywords with hierarchical relationships ## 🔧 Input Configuration Configure your scraping job with these parameters: | Parameter | Type | Description | Default | |-----------|------|-------------|---------| | search | string | Search query (title, author, institution name, etc.) | "" | | entity | select | Entity type to scrape | "works" | | results_wanted | integer | Maximum results to collect | 100 | | max_pages | integer | Maximum API pages to fetch | 10 | | email | string | Email for polite pool (higher rate limits) | "" | | filters | object | Additional API filters | {} | | sort | string | Sort order | "relevance_score:desc" | ### Entity Options - works - Scholarly publications - authors - Researcher profiles - institutions - Academic organizations - venues - Publication outlets - concepts - Research topics ### Example Filters json { "publication_year": "2023", "cited_by_count": ">100", "country_code": "US" } ## 📤 Output Data Structure ### Works Entity Example json { "id": "https://openalex.org/W123456789", "title": "Machine Learning in Healthcare: A Comprehensive Review", "authors": ["Dr. Jane Smith", "Prof. John Doe"], "institutions": ["Harvard University", "MIT"], "publication_year": 2023, "doi": "10.1234/health-ml-2023", "url": "https://openalex.org/W123456789", "abstract": "This paper explores the applications of machine learning...", "concepts": ["Machine Learning", "Healthcare", "Artificial Intelligence"], "cited_by_count": 245, "type": "journal-article", "source": "openalex.org" } ### Authors Entity Example json { "id": "https://openalex.org/A123456789", "display_name": "Dr. Jane Smith", "works_count": 87, "cited_by_count": 1250, "last_known_institution": "Harvard University", "orcid": "0000-0001-2345-6789", "source": "openalex.org" } ## 🎯 Usage Examples ### Basic Research Paper Search json { "search": "machine learning healthcare", "entity": "works", "results_wanted": 500, "email": "your-email@example.com" } ### Top Cited Authors in AI json { "entity": "authors", "search": "artificial intelligence", "sort": "cited_by_count:desc", "results_wanted": 100, "filters": { "works_count": ">50" } } ### University Research Output json { "entity": "institutions", "search": "Stanford University", "results_wanted": 1, "email": "researcher@university.edu" } ### Trending Research Topics json { "entity": "concepts", "sort": "works_count:desc", "results_wanted": 50, "filters": { "level": "1" } } ## ⚙️ Advanced Configuration ### Optimizing for Large Datasets - Use email parameter for polite pool access - Set appropriate max_pages to control API usage - Apply filters to narrow results before pagination ### Rate Limiting - Free tier: 100,000 requests/day - Polite pool (with email): Higher priority access - Automatic handling of rate limits with retry logic ### Data Filtering Tips - Use publication_year for time-based analysis - Filter by cited_by_count for impact studies - Country codes for geographical research - Concept IDs for topic-specific queries ## 📈 Use Cases - Bibliometric Analysis: Track citation patterns and research impact - Literature Reviews: Systematic collection of papers on specific topics - Researcher Profiling: Build comprehensive author databases - Institutional Rankings: Compare research output across organizations - Trend Analysis: Identify emerging research areas and concepts - Academic Network Mapping: Discover collaborations and affiliations ## 🔍 API Integration This scraper uses the official OpenAlex REST API: - Base URL: https://api.openalex.org - Documentation: OpenAlex API Guide - Rate Limits: 100,000 requests/day per user - No authentication required (email optional for polite pool) ## 📋 Limits & Considerations - Rate Limits: 100,000 API calls per day (higher with polite pool) - Result Limits: Up to 10,000 results per entity type - Data Freshness: OpenAlex updates data regularly - Data Coverage: Over 200 million works, 15 million authors, 100,000 institutions ## 🤝 Contributing Found a bug or have a feature request? Open an issue on our GitHub repository. ## 📄 License This project is licensed under the MIT License - see the LICENSE file for details. --- Keywords: OpenAlex scraper, academic data extraction, scholarly works API, bibliometric data, research papers scraper, author profiles, institution data, academic analytics, citation analysis, research trends

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try OpenAlex Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
shahidirfan
Pricing
Paid
Total Runs
31
Active Users
5
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support