Proff.no Lead Scraper (Beta)

Name: Proff.no Lead Scraper (Beta)
Author: odaudlegur

by odaudlegur

Automatically scrape business leads from Proff.no. Extract company names, addresses, emails, phone numbers, and social links directly into a usable format.

79 runs

13 users

Try This Actor

Opens on Apify.com

About Proff.no Lead Scraper (Beta)

Need fresh leads from Norway's business directory without the manual headache? I built this Proff.no scraper because I got tired of copying and pasting. It automates the tedious work, visiting Proff.no pages to pull structured data for you. You'll get the business name, physical address, any listed email addresses, phone numbers, and links to their social media profiles, all neatly organized and ready for your CRM or outreach list. It's perfect for sales teams looking to build targeted lists in Norway, marketers researching a local market, or recruiters sourcing companies. I run it myself to find potential partners and clients, and it saves hours I'd otherwise spend on repetitive searches. Just configure your search parameters and let it run; it handles the data extraction so you can focus on the conversations that matter. This is the beta version, so I'm actively improving it based on real user feedback to make it even more reliable.

What does this actor do?

Proff.no Lead Scraper (Beta) is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Proff.no Lead Scraper (Beta)

An Apify Actor for scraping company contact data from Proff.no business listings. It extracts details from listing pages, visits company profiles, and optionally crawls company websites and social media to gather emails and social links.

Categories: LEAD_GENERATION, AUTOMATION

Overview

This actor starts from Proff.no search/listing URLs. It scrapes company profile links, paginates through results, and visits each company detail page. For each company, it extracts core details and then performs optional, bounded crawls of the company's website and social profiles to find additional contact data. It includes validation, sanitization, and caching to improve data quality and performance.

Key Features

Proff.no Scraping: Extracts company links from listing pages and follows pagination automatically.
Structured Data Extraction: Parses company name, categories, phone, address, and website from detail pages. Uses JSON-LD, microdata, and heuristics for reliable address parsing (optimized for Norwegian postcodes).
Website Validation & Crawling: Normalizes website URLs, checks their HTTP status, and can crawl the site (same domain only) to find more emails and social links.
Layered Email Collection: Gathers emails from:
- The Proff detail page (text, mailto: links).
- The company website (prioritizing contact-related pages).
- Social profiles (Facebook, Instagram, LinkedIn) as an optional fallback.
Data Sanitization: Deduplicates, filters out tracking/garbage patterns, and prioritizes emails matching the company's domain.
Performance Optimizations:
- Per-domain website crawl cache to avoid re-crawling the same site for multiple companies.
- Async HTTP with configurable concurrency limits and timeouts.
- Strict bounds on the number of pages crawled per website.

How to Use

Input: Provide one or more starting Proff.no listing URLs (e.g., from a bransjesøk search) in the actor input.
Configure: Set limits like max_results, site_email_max_pages (for website crawling), and enable/disable social profile crawling.
Run: Execute the actor. It will process listing pages, company details, and optional website/social crawls based on your configuration.
Output: Retrieve the structured dataset of company leads from Apify's storage.

Input

The actor accepts the following main configuration via input JSON:

{
  "startUrls": [
    "https://www.proff.no/bransjesøk?q=...",
    "https://www.proff.no/bransjesøk?q=..."
  ],
  "maxResults": 100,
  "siteEmailMaxPages": 10,
  "crawlSocialProfiles": false,
  "maxConcurrency": 10
}

startUrls: (Required) Array of Proff.no listing page URLs.
maxResults: Maximum number of company profiles to scrape.
siteEmailMaxPages: Maximum number of pages to crawl per website for emails.
crawlSocialProfiles: If true, the actor will attempt to crawl social profile pages for emails when website emails are not found.
maxConcurrency: Controls the number of parallel HTTP requests.

Output

The actor outputs a dataset where each item represents a scraped company. Each record includes:

{
  "name": "Company Name AS",
  "categories": ["Consulting", "IT"],
  "phone": "+47 123 45 678",
  "address": {
    "streetAddress": "Gateveien 1",
    "postalCode": "1234",
    "addressLocality": "Oslo"
  },
  "website": "https://example.com",
  "website_details": "ok",
  "emails": ["post@example.com", "contact@example.com"],
  "socialLinks": {
    "facebook": "https://facebook.com/example",
    "linkedin": "https://linkedin.com/company/example"
  },
  "sourceUrl": "https://www.proff.no/selskap/..."
}

website_details: Indicates the website status (ok, 404, unavailable, banned, n/a).
emails: An array of sanitized and deduplicated email addresses, filtered to prioritize company-domain emails.

Note on Terms of Service: This actor scrapes Proff.no's public HTML, which may violate their Terms of Service. Use responsibly, at low request rates, consider using proxies, and ensure compliance with local laws and the website's policies.