arXiv Search Scraper 📚

Name: arXiv Search Scraper 📚
Author: easyapi

by easyapi

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. ...

508 runs

24 users

Try This Actor

Opens on Apify.com

About arXiv Search Scraper 📚

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. 🎓📚

What does this actor do?

arXiv Search Scraper 📚 is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

arXiv Search Scraper 📚 Scrape research papers, authors, and metadata from arXiv search results. Get detailed information about academic papers including titles, authors, abstracts, categories, submission dates and more. ## Features ✨ - 🔍 Scrape papers from any arXiv search URL - 📄 Extract comprehensive paper metadata including: - Paper ID and PDF links - Title and abstract - Author names and profile URLs - Research categories and classifications - Submission dates and comments - ⚡ Fast and efficient pagination handling - 🔄 Support for multiple search URLs - ⚙️ Configurable maximum items limit - 🌐 Proxy support for reliable scraping ## Use Cases 💡 - Research trend analysis - Academic paper monitoring - Building paper databases - Author tracking - Category-based paper collection - Literature review automation ## Input Parameters 🎛️ The actor accepts the following input parameters: | Field | Type | Description | |-------|------|-------------| | searchUrls | Array | List of arXiv search URLs to scrape | | maxItems | Integer | Maximum number of items to scrape (optional) | | proxyConfiguration | Object | Proxy settings (optional) | ## Output 📊 The actor stores results in dataset with the following fields for each paper: - `searchUrl`: Source search URL - `arxivId`: Unique arXiv paper ID - `pdfUrl`: Direct link to PDF - `categories`: Research categories with codes and names - `title`: Paper title - `authors`: Author details including names and profile URLs - `abstract`: Full paper abstract - `submissionDate`: Paper submission date - `comments`: Additional paper comments ## Example Usage 💻 ### Input Example A full explanation of an input example in JSON. `json { "searchUrls": [ "https://arxiv.org/search/?query=ai&searchtype=all&source=header" ], "maxItems": 60 }` ### Output sample The results will be wrapped into a dataset which you can always find in the Storage tab. Here's an excerpt from the data you'd get if you apply the input parameters above: And here is the same data but in JSON. You can choose in which format to download your data: JSON, JSONL, Excel spreadsheet, HTML table, CSV, or XML. json [ { "searchUrl": "https://arxiv.org/search/?query=ai&searchtype=all&source=header", "arxivId": "arXiv:2502.21286", "pdfUrl": "https://arxiv.org/pdf/2502.21286", "categories": [ { "code": "cs.CR", "name": "Cryptography and Security" }, { "code": "cs.LG", "name": "Machine Learning" }, { "code": "cs.NI", "name": "Networking and Internet Architecture" }, { "code": "doi" }, { "code": "10.1109/TNSM.2024.3376631" } ], "title": "Enabling AutoML for Zero-Touch Network Security: Use-Case Driven Analysis", "authors": [ { "name": "Li Yang", "url": "https://arxiv.org/search/?searchtype=author&query=Yang%2C+L" }, { "name": "Mirna El Rajab", "url": "https://arxiv.org/search/?searchtype=author&query=Rajab%2C+M+E" }, { "name": "Abdallah Shami", "url": "https://arxiv.org/search/?searchtype=author&query=Shami%2C+A" }, { "name": "Sami Muhaidat", "url": "https://arxiv.org/search/?searchtype=author&query=Muhaidat%2C+S" } ], "abstract": "Zero-Touch Networks (ZTNs) represent a state-of-the-art paradigm shift towards fully automated and intelligent network management, enabling the automation and intelligence required to manage the complexity, scale, and dynamic nature of next-generation (6G) networks. ZTNs leverage Artificial Intelligence (AI) and Machine Learning (ML) to enhance operational efficiency, support intelligent decision-making, and ensure effective resource allocation. However, the implementation of ZTNs is subject to security challenges that need to be resolved to achieve their full potential. In particular, two critical challenges arise: the need for human expertise in developing AI/ML-based security mechanisms, and the threat of adversarial attacks targeting AI/ML models. In this survey paper, we provide a comprehensive review of current security issues in ZTNs, emphasizing the need for advanced AI/ML-based security mechanisms that require minimal human intervention and protect AI/ML models themselves. Furthermore, we explore the potential of Automated ML (AutoML) technologies in developing robust security solutions for ZTNs. Through case studies, we illustrate practical approaches to securing ZTNs against both conventional and AI/ML-specific threats, including the development of autonomous intrusion detection systems and strategies to combat Adversarial ML (AML) attacks. The paper concludes with a discussion of the future research directions for the development of ZTN security approaches.\n △ Less", "submissionDate": "28 February, 2025", "comments": "Published in IEEE Transactions on Network and Service Management (TNSM); Code is available at Github link: https://github.com/Western-OC2-Lab/AutoML-and-Adversarial-Attack-Defense-for-Zero-Touch-Network-Security" }, ... ] ## Related Actors - 🔬 Nature Search Results Scraper - Extract comprehensive research article data from Nature.com search results - 📚 Goodreads Book Scraper - Extract comprehensive book data for literature research and analysis - 📚 Goodreads Review Scraper - Extract detailed book reviews and ratings for academic literature analysis - 📚 Udemy Course Scraper - Extract detailed course information for educational content research - 📚 Udemy Course Reviews Scraper - Collect comprehensive course review data for educational analysis - 📄 Article Content Extractor - Extract clean article content and metadata from any web page - 🔍 Google Scholar Scraper - Collect scholarly results with flexible search options and citation filtering - 🔍 AI-powered Search - Get AI-enhanced search summaries with references and optimization tips - 📊 Text Sentiment Analysis - Analyze sentiment in research abstracts and academic content - 📝 Text Summarization - Generate concise summaries of research papers and documents - 🔍 PubMed Search Scraper - Extract research papers and academic articles from PubMed - 📚 Substack Publications Scraper - Collect detailed academic newsletter and publication data - 📚 Substack Posts Scraper - Extract comprehensive academic post and article content - 🔍 Keyword Discovery Tool - Discover relevant academic keywords and research topics - 🔍 Keyword Density Checker - Analyze keyword frequency in academic content - 📚 Medium Posts Search Scraper - Extract detailed article data for research content analysis

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try arXiv Search Scraper 📚 now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: easyapi
Pricing: Paid
Total Runs: 508
Active Users: 24

Related Actors

Tecdoc Car Parts

by making-data-meaningful

OpenRouter - Unified LLM Interface for ChatGPT, Claude, Gemini

by xyzzy

Google Sheets Import & Export

by lukaskrivka

Send Email

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support