πŸ’ŽESG Scraper: Sustainability Reports & PDF Disclosures

πŸ’ŽESG Scraper: Sustainability Reports & PDF Disclosures

by primeparse

Powerful ESG scraper (Environmental, Social, and Governance) to automatically extract sustainability reports, PDF disclosures, articles, and content f...

40 runs
3 users
Try This Actor

Opens on Apify.com

About πŸ’ŽESG Scraper: Sustainability Reports & PDF Disclosures

Powerful ESG scraper (Environmental, Social, and Governance) to automatically extract sustainability reports, PDF disclosures, articles, and content from any website. Get clean, AI-ready datasets with keyword filtering, metadata extraction, images, links, and full PDF support.

What does this actor do?

πŸ’ŽESG Scraper: Sustainability Reports & PDF Disclosures is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

# 🌱 ESG Scraper: Sustainability Reports, Articles & PDF Disclosures Extractor Enterprise-grade ESG web scraper that automatically extracts sustainability articles, corporate reports, climate news, and PDF disclosures β€” clean, structured, and ready for investors, compliance teams, or AI training. High-quality ESG & Sustainability Web Scraper for Investors, Analysts, and AI Teams Automatically collects ESG articles, sustainability reports, corporate disclosures, climate news, and PDF reports from any website β€” clean, structured, ready for analysis or AI. Built for: - Sustainable investors & analysts - Compliance and risk teams - AI/ML engineers building ESG models - Researchers and NGOs tracking climate & governance trends βœ… Smart ESG keyword filtering βœ… Full clean article text extraction βœ… PDF sustainability reports parsing βœ… Rich metadata (date, author, description) βœ… ESG-relevant images and related links βœ… AI-ready dataset splitting (overview / full-text / images) πŸ‘‰ Runs on Apify β€’ No code required β€’ Pay only for compute used --- ## πŸš€ Why This Scraper βœ” Purpose-Built for ESG Data Intelligently filters pages using custom ESG keywords (climate, emissions, governance, CSR, net zero, etc.). βœ” Excellent PDF Handling Full text extraction from sustainability and ESG reports (PDF) with metadata where available. βœ” Clean & Noise-Free Output Removes ads, navigation, scripts β€” only meaningful content remains. βœ” Rich Structured Data Title, publication date, author, description, ESG keywords, internal links, relevant images. βœ” AI & ML Ready Optional splitting into specialized datasets for RAG, LLM fine-tuning, or training. βœ” Fast & Efficient Powered by Crawlee + Cheerio β€” excellent for static and content-heavy sites (news, corporate pages, PDFs). For heavily JavaScript-rendered sites, results may vary. βœ” Safe & Controlled Crawling Automatic domain restriction, depth limit (max 3 levels), request limits. --- ## πŸ’Ό Use Cases - ESG portfolio screening and risk monitoring - Training ESG-focused LLMs or RAG systems - Regulatory compliance and disclosure tracking - Competitive intelligence on corporate sustainability - Academic research on climate and governance trends --- ## πŸ“Š Supported Sources - ESG news sections (Reuters, Bloomberg, FT, Guardian, etc.) - Corporate sustainability / ESG pages - Annual sustainability reports (PDF) - Climate, emissions, governance disclosures --- ## βš™οΈ How It Works 1. Provide start URLs (news sections, corporate pages, PDF links) 2. Set custom ESG keywords and limits 3. Run the Actor 4. Download clean, structured ESG datasets --- ## 🧩 Input Configuration ### Example JSON Input json { "startUrls": [ { "url": "https://www.reuters.com/sustainability/" }, { "url": "https://www.weforum.org/stories/technological-innovation/" } ], "allowedDomains": ["reuters.com"], "useApifyProxy": false, "maxRequestsPerCrawl": 500, "esgKeywords": [ "ESG", "sustainability", "climate", "emissions", "net zero", "governance" ], "extractContent": true, "extractMetadata": true, "followLinks": true, "useSeparateDatasets": true, "cleanDefaultDataset": true, "proxyUrls": [ { "url": "http://user:pass@host:port" } ] } Key Options - startUrls β€” one or more starting pages or direct PDF links (required) - allowedDomains β€” restrict crawling to specific domains. If empty, automatically limited to domains from startUrls - maxRequestsPerCrawl β€” control cost and crawl size - esgKeywords β€” custom list for relevance filtering (default includes common ESG terms) - extractContent / extractMetadata β€” toggle full text or metadata extraction - followLinks β€” enable internal crawling (limited to depth 3 for safety) - useSeparateDatasets β€” recommended for large runs and AI workflows - cleanDefaultDataset β€” clear previous run data --- ## πŸ“‚ Output Datasets When useSeparateDatasets: true (recommended): - esg-overview (primary) β€” lightweight metadata for fast analysis - esg-full-content β€” long articles (>5000 characters) - esg-images β€” ESG-relevant images with context - Default dataset β€” minimal preview records (for Apify UI visibility) When useSeparateDatasets: false - Single dataset with full detailed records --- ### Example Output Record (Full Mode) json { "url": "https://www.reuters.com/sustainability/example", "title": "Companies strengthen climate commitments", "scrapedAt": "2025-12-15T10:30:45Z", "publishedDate": "2025-12-10", "author": "Jane Doe", "description": "Major firms enhance ESG targets...", "content": "Full clean article text...\n\nParagraphs preserved...", "esgKeywords": ["climate", "emissions", "sustainability"], "relatedLinks": [ { "url": "https://www.reuters.com/sustainability/esg-guide", "text": "ESG Explained" } ], "images": [ { "url": "https://reuters.com/chart-netzero.jpg", "alt": "Net zero emissions progress" } ] } ### PDF Example json { "url": "https://company.com/sustainability-2024.pdf", "title": "Annual Sustainability Report 2024", "content": "Full extracted report text...", "esgKeywords": ["sustainability", "carbon", "governance"], "type": "PDF", "author": "Corporate Sustainability Team", "publishedDate": "2024-03-15" } --- ## 🏁 Getting Started 1. Click β€œTry for free” on Apify 2. Paste ESG/sustainability URLs or direct PDF links 3. Customize keywords and limits 4. Run and download your dataset --- ## πŸ“§ Support - Email: kidaxxb@gmail.com - Response within 24 hours - Issues: Use Apify Issues tab Tags: ESG, sustainability, web scraping, PDF extraction, climate data, corporate governance, RAG, LLM training, sustainable investing, compliance monitoring Built with ❀️ on Apify

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try πŸ’ŽESG Scraper: Sustainability Reports & PDF Disclosures now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
primeparse
Pricing
Paid
Total Runs
40
Active Users
3
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support