SEO Keyword Extractor

Name: SEO Keyword Extractor
Author: wisteria_banjo

by wisteria_banjo

Finds keyword phrases from a list of websites 🌐, groups similar ones into clear themes 🧩, and ranks them. Also suggests good main keywords ⭐ and possi...

20 runs

4 users

Try This Actor

Opens on Apify.com

About SEO Keyword Extractor

Finds keyword phrases from a list of websites 🌐, groups similar ones into clear themes 🧩, and ranks them. Also suggests good main keywords ⭐ and possible negative keywords 🚫 so you can plan SEO and ad campaigns in a smarter, more focused way 📈.

What does this actor do?

SEO Keyword Extractor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

🔍 SEO Keyword Theme & Negative Keyword Analyzer 🚀 ## 📘 Overview This actor takes one or more URLs, extracts high-value multi-word SEO keyphrases, and then: - Clusters common cross-site keyword families (semantic variants across multiple domains). - Computes n-gram stats (e.g. “real estate lawyer”, “fort lauderdale real estate lawyer”) only for phrases that show up on multiple sites. - Builds keyword themes (ranked topics with all their variants and sites). - Suggests candidate negative keywords (likely competitor names / one-off phrases that only appear on a single site). It’s built for serious competitive research, PPC planning, and semantic SEO clustering across your niche 🌐✨ ## 🌟 Use Cases | 💼 Scenario | 📈 Benefit | |------------|------------| | 🔎 Competitor keyword intelligence | See which phrases multiple competitors converge on (strong themes) vs. one-off phrases (weak or brand-specific). | | 🧩 Local + practice-area SEO | Quickly surface geo + service combos like “fort lauderdale real estate lawyer” or “west palm beach probate attorney.” | | 🧠 Semantic clustering & topic planning | Get “keyword themes” with a primary phrase, all variants, and which sites use them. | | 🎯 PPC campaign & ad group design | Use themes as ad groups and variants as match types; use single-site phrases as negative keyword candidates. | | 🧹 Keyword cleanup & noise reduction | Filters out junky code-like phrases, numeric strings, and odd technical terms by default. | ## 🧪 Output Structure Results are written as flat dataset rows so they’re easy to export to CSV, Sheets, or BI tools. Each row has a `record_type` that tells you what kind of entity it is. ### 1️⃣ Per-page keywords One row per URL: `json { "record_type": "page_keywords", "page_url": "https://example.com", "top_keywords": [ "west palm beach real estate attorney", "florida real estate lawyers", "business litigation fort lauderdale" ] }` ### 2️⃣ Common cross-site keyword families Clusters of similar phrases that show up on more than one site, with similarity metrics: json { "record_type": "common_cross_site_keywords", "group_representative": "florida real estate attorney", "group_keywords": [ "florida real estate attorney", "florida real estate lawyers", "florida real estate law", "law florida real estate", "real estate litigation attorneys" ], "keyword_count": 5, "site_count": 3, "sites": [ "https://a.com", "https://b.com", "https://c.com" ], "levenshtein_avg_distance": 0.31, "levenshtein_max_distance": 0.53 } Use these rows to see: - Which concepts recur across domains (`site_count`). - How tight the wording cluster is (lower Levenshtein distances = more similar). ### 3️⃣ N-gram stats (cross-site phrases) For each n (2, 3, …), the actor aggregates n-grams that appear on at least 3 different sites (strong cross-site themes):json { "record_type": "ngram_3", "ngram": "fort lauderdale real", "n": 3, "count": 8, "site_count": 4, "sites": [ "https://a.com", "https://b.com", "https://c.com", "https://d.com" ], "sample_keywords": [ "fort lauderdale real estate", "lauderdale real estate lawyer", "lauderdale real estate attorneys" ] } `This is great for spotting standard phrases in the market (“real estate lawyer”, “west palm beach”, etc.). ### 4️⃣ Group-to-group similarity (Jaccard) When two cross-site keyword families heavily overlap in their token sets, they’re connected with a Jaccard score:`json { "record_type": "group_similarity", "group_a": "florida real estate attorney", "group_b": "real estate lawyer", "similarity": 0.63 } `These tell you which keyword families are basically talking about the same thing and should probably be treated as one theme in your planning. ### 5️⃣ Keyword themes (the “use this in campaigns” layer) Themes merge similar groups into higher-level topics and rank them:`json { "record_type": "keyword_theme", "primary_keyword": "florida real estate attorney", "score": 0.95, "site_count": 3, "groups_in_theme": 2, "all_variants": [ "florida real estate attorney", "florida real estate law", "florida real estate lawyers", "law florida real estate", "real estate litigation attorneys" ], "all_sites": [ "https://a.com", "https://b.com", "https://c.com" ] } How to use these: - Treat each `keyword_theme` as: - A core SEO topic / pillar page, or - A PPC ad group (primary = ad group name, variants = match types / ad copy phrases). Higher `score` = stronger candidate. ### 6️⃣ Candidate negative keywords The actor also flags n-grams that only appear on one site as negative keyword candidates (often brand names or very specific, non-generic terms):json { "record_type": "negative_keyword_candidate", "phrase": "ryan shipp", "n": 2, "count": 3, "site_count": 1, "sites": [ "https://competitor.com" ], "reason": "single_site_ngram" } `These are not auto-applied negatives. They’re suggestions that you should manually review before adding to a PPC negative list (especially competitor names or hyper-specific phrases you don’t want to pay for). ## ⚙️ Input ### Required fields`json { "urls": [ { "url": "https://example.com" }, { "url": "https://another-site.com" } ], "min_ngram_n": 2 } `` - urls` (array) - Uses the`requestListSources`editor in Apify. - Accepts either`{ "url": "..." }`objects or plain strings`"https://..."`. - `min_ngram_n` (integer, optional, default`2`) - The minimum n-gram length to analyze. -`2`= start at bigrams (“real estate”),`3`= only 3+ word phrases (“real estate lawyer”, “fort lauderdale real estate”). - Unigrams (single words) are never computed to keep noise down. Internally, the actor analyzes n-grams from`min_ngram_n`up to a safe cap (currently`6`) to avoid combinatorial blow-ups on very long phrases. ## 🔄 How it works (under the hood) 1. Fetch & clean - Fetches each URL via HTTP. - Strips scripts, styles, and other noise and extracts visible text. 2. Keyword extraction - Uses a transformer-based model (`all-MiniLM-L6-v2via KeyBERT) to extract multi-word keyphrases from the page content. - Filters out: - Numeric strings - Code-y / technical junk - Blacklisted tokens (e.g., obvious non-SEO boilerplate) - Keeps the most relevant 2–4 word keyphrases per page. 3. Cross-site aggregation - Clusters similar phrases across sites using RapidFuzz (token-set similarity). - Keeps only clusters seen on multiple domains. - Computes Levenshtein distances inside each cluster to quantify how tight/loose the variants are. 4. N-gram analysis - Builds n-gram stats across pages: - Only n in[min_ngram_n, 6]. - Only n-grams seen on ≥ 3 sites are kept as strong cross-site themes. 5. Theme building - Builds a graph of keyword groups connected by high Jaccard similarity. - Collapses connected components into themes. - Scores each theme by: - Cross-site importance (how many sites use it). - Cohesion (Levenshtein-based). - Phrase length (favoring 2–4 word phrases). 6. Negative keyword suggestions - Separately scans all phrases for n-grams that appear on exactly one site. - Emits them asnegative_keyword_candidate`rows for manual review. ## 💰 Monetization & Scaling This actor is designed to work cleanly with Apify Pay-Per-Event (PPE): - One event per run –`apify-actor-start`Charge per actor start (each run). - One event per result row –`apify-default-dataset-item`Every`Actor.push_data(...)`call creates a dataset item, which can be billed as a per-item event. That means: - Small runs with a few URLs → a handful of items → lower cost. - Large competitive sweeps (many domains) → more items (pages, cross-site keywords, themes, negatives) → higher cost but also richer insight. You can control cost by: - Limiting the number of input URLs. - Truncating or filtering which record types you care about (e.g., only`page_keywords`+`keyword_theme`). ## 🔄 Workflow Examples This actor is workflow-ready and plays nicely with other Apify tools: | 🔗 Integration | 🔍 Description | |----------------|----------------| |`serp-scraper`| Scrape top-ranking Google results for a query, then feed the URLs here to see the shared themes across the SERP. | |`map-scraper` | Collect local business websites from Google Maps, then compare cross-site phrasing for local SEO campaigns. | | Other actors | Build end-to-end automations: harvest → extract → cluster → export to Sheets/Data Studio. | ## 🚀 Ready to Launch? Use this actor when you want more than just a list of keywords: - See which phrases truly define your niche (themes & n-grams). - Separate generic market language from brand-specific noise. - Build better SEO topics, tighter PPC ad groups, and smarter negative lists. Perfect for: - SEO agencies - Performance marketers - Local law firms & service businesses - Content strategists and SERP analysts Happy crawling & clustering! 🚀🌐

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try SEO Keyword Extractor now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: wisteria_banjo
Pricing: Paid
Total Runs: 20
Active Users: 4

Related Actors

Google Search Results Scraper

by apify

Google Search Results (SERP) Scraper

by scraperlink

Google Search

by devisty

Bing Search Scraper

by tri_angle

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support