Goodreads Books Scraper
by shahidirfan
Efficiently extract detailed book data with the Goodreads Books Scraper. Ideal for building reading lists or analyzing metadata. Note: For bulk scrapi...
Opens on Apify.com
About Goodreads Books Scraper
Efficiently extract detailed book data with the Goodreads Books Scraper. Ideal for building reading lists or analyzing metadata. Note: For bulk scraping of more than 50 books, providing JSON cookies is essential to ensure seamless access and reliable results.
What does this actor do?
Goodreads Books Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Goodreads Book Scraper Extract comprehensive book data from Goodreads shelves including titles, authors, ratings, reviews, descriptions, ISBNs, genres, and publication details. Perfect for book analysis, market research, reading list creation, and literary data collection. ## What does the Goodreads Book Scraper do? The Goodreads Book Scraper enables you to extract detailed book information from any Goodreads shelf or category. Whether you're building a reading recommendation system, conducting market research, or creating a personal book database, this scraper provides all the data you need. ### Key capabilities: - 📚 Extract book details - Titles, authors, ratings, review counts, descriptions, and more - 🔄 Automatic pagination - Seamlessly navigate through multiple pages of results - ⚡ Fast & efficient - Lightweight design optimized for speed and reliability - 📊 Structured data - Clean JSON output ready for analysis or integration - 🎯 Flexible targeting - Scrape any Goodreads shelf by name or URL - 🔍 Two scraping modes - Quick overview or detailed book information ## Why scrape Goodreads? Goodreads is the world's largest community of book lovers with over 90 million members and data on millions of books. Access to this data enables: - Market research - Analyze book trends, popular genres, and reader preferences - Recommendation systems - Build personalized book recommendation engines - Content curation - Create reading lists and book collections - Price monitoring - Track book popularity for inventory decisions - Academic research - Study reading patterns and literary trends - Personal libraries - Organize and manage your reading lists ## How much does it cost to scrape Goodreads? The cost depends on the number of books you scrape and whether you enable detailed scraping. Here are typical usage estimates: - 100 books (basic) - ~0.01-0.02 Apify compute units - 100 books (detailed) - ~0.03-0.05 Apify compute units - 1,000 books (detailed) - ~0.30-0.50 Apify compute units Apify provides 5 USD of free credits monthly, enough to scrape thousands of books. For larger projects, paid plans start at $49/month. ## Input configuration Configure the scraper using these parameters: ### Basic settings
| Start URL | Direct URL to a Goodreads shelf (e.g., https://www.goodreads.com/shelf/show/fantasy) |
| Shelf Name | Name of the shelf to scrape (e.g., fantasy, science-fiction, bestsellers) |
| Maximum Books | Number of books to scrape (default: 100) |
| Maximum Pages | Safety limit on pages to visit (default: 10) |
| Collect Details | Enable to extract full book information including descriptions, ISBNs, and genres (default: enabled) |
| Cookies | Authentication cookies for accessing paginated results (required for pages beyond the first) |
| Proxy Configuration | Proxy settings (residential proxies recommended) |
json { "shelf": "fantasy", "results_wanted": 100, "max_pages": 5, "collectDetails": true, "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] } } ## Output format The scraper provides structured JSON data for each book: ### Basic output (without detailed scraping) json { "title": "The Name of the Wind", "author": "Patrick Rothfuss", "rating": 4.52, "ratingCount": 985432, "reviewCount": 45678, "image": "https://i.gr-assets.com/images/S/...", "url": "https://www.goodreads.com/book/show/186074" } ### Detailed output (with detailed scraping enabled) json { "title": "The Name of the Wind", "author": "Patrick Rothfuss", "rating": 4.52, "ratingCount": 985432, "reviewCount": 45678, "description": "Told in Kvothe's own voice, this is the tale of the magically gifted young man...", "image": "https://i.gr-assets.com/images/S/...", "isbn": "0756404746", "publisher": "DAW Books", "publishDate": "March 27, 2007", "genres": ["Fantasy", "Fiction", "Magic", "Adventure"], "url": "https://www.goodreads.com/book/show/186074" } ### Output fields | Field | Type | Description |
|---|---|---|
| title | string | Book title |
| author | string | Primary author name(s) |
| rating | number | Average rating (0-5 scale) |
| ratingCount | number | Total number of ratings |
| reviewCount | number | Total number of reviews |
| description | string | Book description/synopsis (detailed mode only) |
| image | string | URL to book cover image |
| isbn | string | ISBN identifier (detailed mode only) |
| publisher | string | Publisher name (detailed mode only) |
| publishDate | string | Publication date (detailed mode only) |
| genres | array | List of book genres/categories (detailed mode only) |
| url | string | Goodreads book URL |
javascript const Apify = require('apify-client'); const client = new Apify.ApifyClient({ token: 'YOUR_API_TOKEN', }); const run = await client.actor('YOUR_USERNAME/goodreads-book-scraper').call({ shelf: 'fantasy', results_wanted: 100, collectDetails: true, }); const { items } = await client.dataset(run.defaultDatasetId).listItems(); console.log(items); ### Using as a standalone script 1. Clone this repository 2. Run npm install 3. Configure INPUT.json with your parameters 4. Run npm start ## Important notes on pagination ⚠️ Authentication requirement: Goodreads restricts pagination to authenticated users. Non-logged users can only access the first page (approximately 50 books). ### To access multiple pages: 1. Log in to Goodreads in your browser 2. Open DevTools (F12) → Network tab 3. Reload the page and find a request to goodreads.com 4. Copy the Cookie header from the request headers 5. Paste the cookie value into the "Authentication cookies" field The scraper will use your cookies to access paginated results. Pagination URLs follow this pattern: https://www.goodreads.com/shelf/show/fantasy?page=2 ## Popular Goodreads shelves to scrape Get started quickly with these popular shelves: - fantasy - Fantasy fiction and magic - science-fiction - Sci-fi and speculative fiction - romance - Romance novels - mystery - Mystery and thriller books - young-adult - YA fiction - classics - Classic literature - non-fiction - Non-fiction works - biography - Biographies and memoirs - history - Historical works - self-help - Self-improvement books - business - Business books - philosophy - Philosophy texts You can find more shelves by browsing Goodreads Shelves. ## Scraping best practices ### Performance optimization - Set reasonable limits - Use results_wanted to control scraping volume - Enable detailed scraping selectively - Disable if you only need basic information - Use residential proxies - Required for accessing multiple pages - Implement rate limiting - The scraper includes built-in concurrency controls ### Data quality - Validate output - Check that all expected fields are populated - Handle missing data - Some books may have incomplete information - Monitor for changes - Goodreads may update their HTML structure ### Compliance - Respect robots.txt - The scraper follows Goodreads guidelines - Don't overload servers - Use appropriate concurrency settings - Review Terms of Service - Ensure your use case complies with Goodreads policies - Personal use recommended - Commercial use may require additional consideration ## Troubleshooting ### No books found on page 2+ Solution: You need to provide authentication cookies. See the pagination section above. ### Scraper returns incomplete data Solution: Enable "Collect Details" to fetch comprehensive book information. ### Rate limiting or blocked requests Solution: Use residential proxies and reduce concurrency if needed. ### Outdated selectors Solution: Goodreads occasionally updates their website. Contact support if selectors need updating. ## Use cases ### Market Research Analyze book trends, identify popular genres, and understand reader preferences to make data-driven publishing decisions. ### Recommendation Systems Build sophisticated book recommendation engines using ratings, genres, and reader reviews. ### Academic Research Study literary trends, analyze reading patterns, and conduct research on book popularity and cultural impact. ### Content Creation Create curated reading lists, book blogs, and literary content based on comprehensive book data. ### Personal Library Management Organize your reading lists, track books to read, and manage your personal book collection. ## Support Need help? Have questions? - Documentation: Check out the detailed Apify documentation - Community: Join the Apify Discord - Issues: Report bugs or request features on the GitHub repository ## Related actors Explore similar scrapers: - Amazon Book Scraper - Extract book data from Amazon - Barnes & Noble Scraper - Scrape B&N book listings - Google Books Scraper - Extract data from Google Books - Book Price Monitor - Track book prices across platforms --- Built with ❤️ for the reading community. Happy scraping!
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Goodreads Books Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- shahidirfan
- Pricing
- Paid
- Total Runs
- 14
- Active Users
- 2
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support