Google Scholar Search Scraper
by ecomscrape
Automate your Google Scholar data collection. This scraper extracts research papers, citations, author info, and PDF links, saving hours on literature reviews and academic projects.
Opens on Apify.com
About Google Scholar Search Scraper
If you've ever spent hours manually collecting academic papers from Google Scholar, you know the pain. This scraper was built to end that. It pulls everything you'd want from a Scholar search: full paper details, citation counts, author profiles, and direct PDF links when available. I use it to automate the grunt work of literature reviews, so I can focus on the actual research. Think of it as your research assistant that never sleeps. Set up a search query for your topic, and it will systematically extract the data into a clean, structured format like JSON or CSV. It handles the pagination and the parsing, so you don't have to. This is perfect for academics building a bibliography, data scientists gathering papers for meta-analysis, or developers creating academic databases. It saves a massive amount of time and ensures you don't miss important papers buried in the search results. You can run it once for a specific project or schedule it to keep tabs on new publications in your field. It just works, and it turns a days-long manual task into something that takes a few minutes to set up.
What does this actor do?
Google Scholar Search Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Google Scholar Search Scraper
This actor automates data extraction from Google Scholar search results. It's designed for researchers, data scientists, and developers who need to programmatically collect academic literature, citations, and publication details at scale.
Overview
The scraper navigates Google Scholar, handles pagination, and extracts structured data from search result pages. You provide search URLs, and it returns clean, organized data, eliminating the need for manual copying and pasting. It's useful for literature reviews, bibliometric analysis, and building academic datasets.
Key Features
- Structured Data Extraction: Pulls titles, authors, publication years, citations, summaries, and links.
- Handles Complex Queries: Accepts direct Google Scholar URLs with filters (year, language, etc.).
- Automatic Pagination: Can navigate through multiple pages of results.
- Proxy Support: Uses Apify Proxy (including residential proxies) to help avoid bot detection and manage request rates.
- Configurable Limits: Set maximum items per search and retry attempts for failed requests.
How to Use
Configure the actor using a JSON input. The core requirement is a list of urls (Google Scholar search result pages). Run the actor on the Apify platform, and it will output the scraped data into a dataset.
Input/Output
Input Format
Configure the actor with a JSON object. Here's the structure with explanations:
{
"max_retries_per_url": 2,
"proxy": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"],
"apifyProxyCountry": "SG"
},
"max_items_per_url": 20,
"urls": [
"https://scholar.google.com/scholar?hl=vi&as_sdt=0%2C5&as_ylo=2025&q=AI&btnG=",
"https://scholar.google.com/scholar?start=80&q=webscraping&hl=vi&as_sdt=0,5&as_ylo=2025"
]
}
urls(Required): An array of Google Scholar search URLs. You can generate these by performing a search in your browser and copying the address.max_items_per_url: Limits the total number of items scraped per URL.max_retries_per_url: Number of retry attempts if a request fails.proxy: Configuration for Apify Proxy. Using residential proxies (RESIDENTIAL) is recommended to reduce the chance of being blocked.
Output Format
The actor stores results in a dataset. Each item typically includes fields like:
* title
* authors
* year
* citationCount
* summary (snippet)
* publication (source/journal)
* url (link to the paper)
* scholarUrl (link to the Google Scholar entry)
Contact
If you encounter issues, contact the developer via My profile.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Google Scholar Search Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- ecomscrape
- Pricing
- Paid
- Total Runs
- 131
- Active Users
- 16
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support