Google Scholar Search Scraper

Google Scholar Search Scraper

by ecomscrape

Automate your Google Scholar data collection. This scraper extracts research papers, citations, author info, and PDF links, saving hours on literature reviews and academic projects.

131 runs
16 users
Try This Actor

Opens on Apify.com

About Google Scholar Search Scraper

If you've ever spent hours manually collecting academic papers from Google Scholar, you know the pain. This scraper was built to end that. It pulls everything you'd want from a Scholar search: full paper details, citation counts, author profiles, and direct PDF links when available. I use it to automate the grunt work of literature reviews, so I can focus on the actual research. Think of it as your research assistant that never sleeps. Set up a search query for your topic, and it will systematically extract the data into a clean, structured format like JSON or CSV. It handles the pagination and the parsing, so you don't have to. This is perfect for academics building a bibliography, data scientists gathering papers for meta-analysis, or developers creating academic databases. It saves a massive amount of time and ensures you don't miss important papers buried in the search results. You can run it once for a specific project or schedule it to keep tabs on new publications in your field. It just works, and it turns a days-long manual task into something that takes a few minutes to set up.

What does this actor do?

Google Scholar Search Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Google Scholar Search Scraper

This actor automates data extraction from Google Scholar search results. It's designed for researchers, data scientists, and developers who need to programmatically collect academic literature, citations, and publication details at scale.

Overview

The scraper navigates Google Scholar, handles pagination, and extracts structured data from search result pages. You provide search URLs, and it returns clean, organized data, eliminating the need for manual copying and pasting. It's useful for literature reviews, bibliometric analysis, and building academic datasets.

Key Features

  • Structured Data Extraction: Pulls titles, authors, publication years, citations, summaries, and links.
  • Handles Complex Queries: Accepts direct Google Scholar URLs with filters (year, language, etc.).
  • Automatic Pagination: Can navigate through multiple pages of results.
  • Proxy Support: Uses Apify Proxy (including residential proxies) to help avoid bot detection and manage request rates.
  • Configurable Limits: Set maximum items per search and retry attempts for failed requests.

How to Use

Configure the actor using a JSON input. The core requirement is a list of urls (Google Scholar search result pages). Run the actor on the Apify platform, and it will output the scraped data into a dataset.

Input/Output

Input Format

Configure the actor with a JSON object. Here's the structure with explanations:

{
  "max_retries_per_url": 2,
  "proxy": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"],
    "apifyProxyCountry": "SG"
  },
  "max_items_per_url": 20,
  "urls": [
    "https://scholar.google.com/scholar?hl=vi&as_sdt=0%2C5&as_ylo=2025&q=AI&btnG=",
    "https://scholar.google.com/scholar?start=80&q=webscraping&hl=vi&as_sdt=0,5&as_ylo=2025"
  ]
}
  • urls (Required): An array of Google Scholar search URLs. You can generate these by performing a search in your browser and copying the address.
  • max_items_per_url: Limits the total number of items scraped per URL.
  • max_retries_per_url: Number of retry attempts if a request fails.
  • proxy: Configuration for Apify Proxy. Using residential proxies (RESIDENTIAL) is recommended to reduce the chance of being blocked.

Output Format

The actor stores results in a dataset. Each item typically includes fields like:
* title
* authors
* year
* citationCount
* summary (snippet)
* publication (source/journal)
* url (link to the paper)
* scholarUrl (link to the Google Scholar entry)

Contact

If you encounter issues, contact the developer via My profile.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Google Scholar Search Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
ecomscrape
Pricing
Paid
Total Runs
131
Active Users
16
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support