Named Entity Extractor & Name Validator

Named Entity Extractor & Name Validator

by dominic-quaiser

Extract named entities from text using a NER API. Supports multilingual, English, and German text extraction with confidence scores for each detected ...

48 runs
8 users
Try This Actor

Opens on Apify.com

About Named Entity Extractor & Name Validator

Extract named entities from text using a NER API. Supports multilingual, English, and German text extraction with confidence scores for each detected name.

What does this actor do?

Named Entity Extractor & Name Validator is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Named Entity Extractor & Name Validator A high-performance Apify Actor running in Standby mode that extracts person names from text using a Named Entity Recognition (NER) API. This Actor provides real-time name extraction with confidence scoring, supporting multilingual, English, and German text processing. ## 💡 Features - Real-Time API: On demand HTTP server with sub-second response times - Multilingual Support: Extract names from English, German, or multilingual text - Confidence Filtering: Set minimum confidence thresholds (0-100%) to control result quality - Automatic Data Storage: All results are automatically saved to your Apify dataset - Comprehensive Output: Returns names with confidence scores, plus organizations, locations, and other entities - Built for Scale: Automatic scaling handles concurrent requests efficiently ## 🚀 How to Use This Actor runs in Standby mode, which means it operates like a standard API. You don't need to start the Actor manually - simply send HTTP requests to the Actor's endpoint and get instant results. ## 📍 Endpoint Send a POST request to your Actor's Standby URL with the following format: https://dominic-quaiser--named-entity-extractor.apify.actor?token=YOUR_API_TOKEN ## 📥 Request Format You need an Apify account and API token to use this Actor. Get your token from Settings → Integrations in Apify Console. Recommended method - Include the token in the Authorization header: bash curl -X POST https://dominic-quaiser--named-entity-extractor.apify.actor \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "text": "Maria Salomea Skłodowska-Curie; known as Marie Curie was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.\n She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields. Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes. She was, in 1906, the first woman to become a professor at the University of Paris.", "language": "en", "minConfidence": 70 }' Alternative method - Add token as a query parameter: bash curl -X POST https://dominic-quaiser--named-entity-extractor.apify.actor?token=YOUR_API_TOKEN \ -H "Content-Type: application/json" \ -d '{ "text": "Marie Skłodowska Curie war eine Physikerin und Chemikerin polnischer Herkunft, die in Frankreich lebte und wirkte. Sie untersuchte die 1896 von Henri Becquerel beobachtete Strahlung von Uranverbindungen und prägte für diese das Wort „radioaktiv“. Im Rahmen ihrer Forschungen, für die ihr 1903 ein anteiliger Nobelpreis für Physik und 1911 der Nobelpreis für Chemie zugesprochen wurde, entdeckte sie gemeinsam mit ihrem Ehemann Pierre Curie die chemischen Elemente Polonium und Radium. Marie Curie ist die einzige Frau unter den fünf Personen, denen bisher mehrfach ein Nobelpreis verliehen wurde, und neben Linus Pauling die einzige Person, die Nobelpreise auf zwei unterschiedlichen Fachgebieten erhielt.", "language": "de", "minConfidence": 95 }' ### Request Parameters | Parameter | Type | Description | Valid Values | Default | Required | | --------------- | ------ | ----------------------------- | ------------------------- | --------- | -------- | | text | string | Text to extract names from | 10-2000 characters | - | Yes | | language | string | Language model to use | "multi", "en", "de" | "multi" | No | | minConfidence | number | Minimum confidence percentage | 0-100 | 70 | No | ### Language Options Choose the appropriate language mode based on your text: - multi (Multilingual): Best for mixed-language text or when language is unknown. Works with most European languages. - en (English): Optimized for English text. Use when you know the text is primarily in English for better accuracy. - de (German): Optimized for German text. Use for German-language documents, particularly helpful with German naming conventions. ### Confidence Threshold Guidelines The minConfidence parameter controls result quality. Here's how to choose: - 50-60%: Very permissive - includes many names but may have false positives - 70-80%: Balanced - good mix of recall and precision (recommended for most use cases) - 85-95%: Conservative - high precision but may miss some valid names - 95-100%: Very strict - only highest confidence detections Start with 70% and adjust based on your needs. ## 📤 Response Format ### Success Response (200 OK) json { "language": "en", "total_names_found": 4, "processing_time": 4.2364819049835205, "names": [ { "person": "Maria Salomea Skłodowska", "confidence": 0.929 }, { "person": "Curie", "confidence": 0.99 }, { "person": "Marie Curie", "confidence": 0.999 }, { "person": "Pierre Curie", "confidence": 0.999 } ], "persons": [ "Maria Salomea Skłodowska", "Curie", "Marie Curie", "Pierre Curie" ], "organizations": [ "University Of Paris" ], "locations": [], "miscellaneous": [ "Polish", "French", "Nobel Prize" ], "raw_entities": [ { "text": "Maria Salomea Skłodowska", "type": "person", "score": 0.929, "start": 0, "end": 24 }, { "text": "Curie", "type": "person", "score": 0.99, "start": 25, "end": 30 }, { "text": "Marie Curie", "type": "person", "score": 0.999, "start": 41, "end": 52 }, { "text": "Polish", "type": "miscellaneous", "score": 0.997, "start": 59, "end": 65 }, { "text": "French", "type": "miscellaneous", "score": 0.999, "start": 82, "end": 88 }, { "text": "Nobel Prize", "type": "miscellaneous", "score": 0.996, "start": 196, "end": 207 }, { "text": "Nobel Prize", "type": "miscellaneous", "score": 0.996, "start": 235, "end": 246 }, { "text": "Nobel Prize", "type": "miscellaneous", "score": 0.996, "start": 283, "end": 294 }, { "text": "Pierre Curie", "type": "person", "score": 0.999, "start": 334, "end": 346 }, { "text": "Nobel Prize", "type": "miscellaneous", "score": 0.996, "start": 377, "end": 388 }, { "text": "Nobel Prize", "type": "miscellaneous", "score": 0.996, "start": 438, "end": 449 }, { "text": "Curie", "type": "person", "score": 0.922, "start": 468, "end": 473 }, { "text": "Nobel Prize", "type": "miscellaneous", "score": 0.994, "start": 496, "end": 507 }, { "text": "University of Paris", "type": "organization", "score": 0.969, "start": 573, "end": 592 } ], "confidence_scores": { "persons": [ 0.929, 0.99, 0.999, 0.999 ], "organizations": [ 0.5 ], "locations": [], "miscellaneous": [ 0.997, 0.999, 0.996 ] }, "model_used": "dbmdz/bert-large-cased-finetuned-conll03-english", "text": "Maria Salomea Skłodowska-Curie; known as Marie Curie was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.\n She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields. Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes. She was, in 1906, the first woman to become a professor at the University of Paris." } ### Response Fields | Field | Type | Description | | ------------------- | ------ | ---------------------------------------------- | | language | string | Language model used for extraction | | total_names_found | number | Count of names meeting confidence threshold | | processing_time | number | Processing time in seconds | | names | array | Filtered names with confidence ≥ minConfidence | | persons | array | All detected person names | | organizations | array | All detected organizations | | locations | array | All detected locations | | miscellaneous | array | Other detected entities | | raw_entities | array | Detailed entity data with positions and scores | | confidence_scores | object | Confidence scores by entity type | | model_used | string | Name of the NER model used | | text | string | Original input text | ### Error Response (400/500) json { "error": { "status_code": 400, "message": "Field 'text' is required and must be a non-empty string." } } ## 💻 Code Examples ### Python python import requests url = "https://dominic-quaiser--named-entity-extractor.apify.actor?token=YOUR_API_TOKEN" payload = { "text": "Angela Merkel met with Emmanuel Macron in Berlin.", "language": "multi", "minConfidence": 80 } response = requests.post(url, json=payload) print(response.json()) ### JavaScript/Node.js javascript const response = await fetch( 'https://dominic-quaiser--named-entity-extractor.apify.actor?token=YOUR_API_TOKEN', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ text: 'Angela Merkel met with Emmanuel Macron in Berlin.', language: 'multi', minConfidence: 80 }) } ); const data = await response.json(); console.log(data); ### PHP php <?php $url = 'https://dominic-quaiser--named-entity-extractor.apify.actor?token=YOUR_API_TOKEN'; $data = [ 'text' => 'Angela Merkel met with Emmanuel Macron in Berlin.', 'language' => 'multi', 'minConfidence' => 80 ]; $ch = curl_init($url); curl_setopt($ch, CURLOPT_POST, true); curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data)); curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type: application/json']); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $response = curl_exec($ch); curl_close($ch); echo $response; ?> ## 🎯 Use Cases - Lead Generation: Extract contact names from business emails or documents - Content Analysis: Identify people mentioned in articles or social media - Document Processing: Extract names from contracts, resumes, or legal documents - CRM Enrichment: Parse names from unstructured text data - Compliance & KYC: Extract and validate person names for regulatory purposes - News Monitoring: Track mentions of specific individuals in news feeds - Academic Research: Extract author names from papers or citations - Customer Support: Identify customer names from support tickets and feedback ## ❓ Error Handling The Actor returns appropriate HTTP status codes: - 200: Success - names extracted successfully - 400: Bad Request - invalid input (check error message for details) - 401: Unauthorized - invalid or missing API token - 402: Payment Required - dataset limit reached, upgrade your plan to continue - 500: Internal Server Error - unexpected error (contact support if persistent) Example error response: json { "error": "Text length must be between 3 and 2000 characters, got 2", "status": 400 } Example quota limit response: json { "error": { "status_code": 402, "message": "Dataset limit of 1000 items reached. Please upgrade your plan to continue storing results." } } ## ⚖️ Legal Disclaimer You are solely responsible for determining the legality of your use of this Actor and the data it generates. The handling of data, particularly personal information, is subject to complex legal frameworks like the General Data Protection Regulation (GDPR/DSGVO) and copyright laws. It is your responsibility to ensure your use case is compliant with all applicable laws. This text does not constitute legal advice. ### GDPR Notice Please be aware that an external NER API hosted on a private server in the European Union is used for data processing. - What is Processed: The request content you submit, which may contain personal names and other entities. - Why: To perform Named Entity Recognition (NER) and extract person names, organizations, locations, and other entities from your text. - Data Controller: You, the user, are the data controller. The Actor's developer acts as the data processor for this specific task - Location & Compliance: All NER processing for this feature occurs within the EU (Germany) and is subject to GDPR (DSGVO) - Data Storage: The request is processed in-memory and is not stored or logged on the external NER server - Important: This NER processing is external to the Apify platform and is not covered by Apify's DPA. By using this Actor, you acknowledge this separate data processing activity ## 🛠️ Maintainer - Author: Dominic M. Quaiser - Contact: mail@dominic-quaiser.io - Website: dominic-quaiser.io

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Named Entity Extractor & Name Validator now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
dominic-quaiser
Pricing
Paid
Total Runs
48
Active Users
8
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support