Amazon AI Product Intelligence

Name: Amazon AI Product Intelligence
Author: visita

by visita

Amazon AI Product Intelligence Stream is an advanced, AI-driven Actor designed to provide deep, structured intelligence from the global Amazon marketp...

117 runs

3 users

Try This Actor

Opens on Apify.com

About Amazon AI Product Intelligence

Amazon AI Product Intelligence Stream is an advanced, AI-driven Actor designed to provide deep, structured intelligence from the global Amazon marketplace. It is built for targeted competitive and market analysis on e-commerce products.

What does this actor do?

Amazon AI Product Intelligence is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

🧠 Amazon AI Product Intelligence Stream This Actor performs advanced, structured data extraction and synthesis on Amazon product pages. It uses Playwright for targeted, stealthy scraping and leverages large language models (LLMs) via LangChain's structured output feature to convert raw HTML product details into actionable, clean JSON data and a final business report. The Actor is designed for maximum reliability and flexibility, using a robust, two-tier processing system (Crawl Only Mode and Local Structured AI Mode). --- ## 🚀 Key Features and Improvements * Local Structured AI Mode (Tier 2): Replaced the unstable external ChatKit API workflow with reliable local structured extraction using LangChain and OpenAI. This eliminates `HTTP 404` errors and ensures predictable JSON output. * Dynamic Schema Selection: Automatically switches the LLM's output schema based on the user's Analysis Objective (Prompt Selection). This provides precise, dedicated structured output for technical specifications (`AmazonTechnicalSpecs`) and general data (`AmazonProductData`). * Complete Data Output: The final dataset now includes the single Aggregate Synthesis Report plus individual Structured Item Reports for every successfully processed product, offering both macro and micro data views. * Price & ASIN Robustness: Includes advanced Playwright selectors and injection logic to maximize the capture rate of dynamic data like Price and ASIN before passing content to the LLM for structuring. * Improved User Experience: The input interface is optimized with emojis and user-friendly editors, including a multi-select for search queries (`stringList`) and a dropdown for LLM model selection (including GPT-5) and Amazon domains. --- ## ⚙️ Configuration and Input The Actor's input is defined via `input_schema.json`, providing a user-friendly interface divided into three sections: ### 1. 🔍 Search Configuration | Field | Type | Description | | :--- | :--- | :--- | | `amazonSearchQueries` | `array` (`stringList`) | The keywords to search for (one query per line). | | `amazonDomain` | `string` (`select`) | The Amazon marketplace to target (e.g., `com`, `co.uk`, `jp`). | | `maxTotalProducts` | `integer` | Max total unique product pages to process in the run. | | `maxProductsPerPage`| `integer` | Max product links to pull from each search result page. | ### 2. 🧠 Analysis & AI Control | Field | Type | Description | | :--- | :--- | :--- | | `enableAISynthesis` | `boolean` | If true (default): Runs the full LLM-based structured extraction and synthesis (Tier 2). | | `promptSelection` | `string` (`select`) | Defines the analysis objective (e.g., `core_summary`, `technical_specs`, `customer_sentiment`, or `custom_input`). | | `customPrompt` | `string` (`textarea`) | Used by the LLM when `custom_input` is selected (e.g., "Extract the screen size and processor model."). | | `llmModel` | `string` (`select`) | Selects the GPT model (e.g., `gpt-4o-mini`, `gpt-4o`, `gpt-5`) for all extraction and synthesis tasks. | | `verboseLog` | `boolean` | Enables detailed debug logging for troubleshooting. | --- ## 📊 Output Structure The Actor pushes multiple JSON objects to the default Dataset, ensuring a comprehensive output: ### Item 1: Final Synthesis Report (`_tier: AI_SYNTHESIS_REPORT`) This is the single aggregate summary of all products processed for the original query. | Field | Description | | :--- | :--- | | `report` | The comprehensive, synthesized final business summary generated by the LLM. | | `sources` | Array of all product URLs used in the report. | | `extra_specs_json` | A single JSON string summarizing the most common miscellaneous specifications found across all products. | ### Subsequent Items: Individual Product Reports (`_tier: AI_SYNTHESIS`) These contain the raw, structured data extracted from each successful product page. | Field | Description | | :--- | :--- | | `product_title` | The title of the product. | | `asin` | The product's ASIN. | | `report` | A short, human-readable summary of the structured data extracted for this specific product. | | `core_data_point` / `price_with_currency` / etc. | The specific structured data fields defined by the chosen analysis objective. | ### Fallback Items (`_tier: CRAWL_ONLY_FALLBACK`) These items are pushed if the LLM extraction fails (e.g., API error or Pydantic error), providing the raw HTML/Markdown content for manual review. --- ## 🛠️ Developer Notes * Model IDs: The `_initialize_llm` function automatically strips the redundant `"openai/"` prefix from the model name selected in the input UI to prevent Invalid Model ID errors when calling the OpenAI API. * Schema Handling: The `scraper_logic.py` dynamically selects and converts between Pydantic models (`AmazonProductData`, `AmazonTechnicalSpecs`, `FinalReportSchema`) and Python dictionaries using `.model_dump()` to ensure clean data flow and prevent Pydantic validation errors during aggregation. * Dependencies: The `requirements.txt` includes necessary asynchronous libraries (`playwright`, `httpx`) and the LangChain/OpenAI stack (`langchain-openai`) for robust execution.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Amazon AI Product Intelligence now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: visita
Pricing: Paid
Total Runs: 117
Active Users: 3

Related Actors

Google Search Results Scraper

by apify

Website Content Crawler

by apify

🔥 Leads Generator - $3/1k 50k leads like Apollo

by microworlds

Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.

by invideoiq

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support