Amazon AI Product Intelligence
by visita
Amazon AI Product Intelligence Stream is an advanced, AI-driven Actor designed to provide deep, structured intelligence from the global Amazon marketp...
Opens on Apify.com
About Amazon AI Product Intelligence
Amazon AI Product Intelligence Stream is an advanced, AI-driven Actor designed to provide deep, structured intelligence from the global Amazon marketplace. It is built for targeted competitive and market analysis on e-commerce products.
What does this actor do?
Amazon AI Product Intelligence is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
🧠 Amazon AI Product Intelligence Stream This Actor performs advanced, structured data extraction and synthesis on Amazon product pages. It uses Playwright for targeted, stealthy scraping and leverages large language models (LLMs) via LangChain's structured output feature to convert raw HTML product details into actionable, clean JSON data and a final business report. The Actor is designed for maximum reliability and flexibility, using a robust, two-tier processing system (Crawl Only Mode and Local Structured AI Mode). --- ## 🚀 Key Features and Improvements * Local Structured AI Mode (Tier 2): Replaced the unstable external ChatKit API workflow with reliable local structured extraction using LangChain and OpenAI. This eliminates HTTP 404 errors and ensures predictable JSON output. * Dynamic Schema Selection: Automatically switches the LLM's output schema based on the user's Analysis Objective (Prompt Selection). This provides precise, dedicated structured output for technical specifications (AmazonTechnicalSpecs) and general data (AmazonProductData). * Complete Data Output: The final dataset now includes the single Aggregate Synthesis Report plus individual Structured Item Reports for every successfully processed product, offering both macro and micro data views. * Price & ASIN Robustness: Includes advanced Playwright selectors and injection logic to maximize the capture rate of dynamic data like Price and ASIN before passing content to the LLM for structuring. * Improved User Experience: The input interface is optimized with emojis and user-friendly editors, including a multi-select for search queries (stringList) and a dropdown for LLM model selection (including GPT-5) and Amazon domains. --- ## ⚙️ Configuration and Input The Actor's input is defined via input_schema.json, providing a user-friendly interface divided into three sections: ### 1. 🔍 Search Configuration | Field | Type | Description | | :--- | :--- | :--- | | amazonSearchQueries | array (stringList) | The keywords to search for (one query per line). | | amazonDomain | string (select) | The Amazon marketplace to target (e.g., com, co.uk, jp). | | maxTotalProducts | integer | Max total unique product pages to process in the run. | | maxProductsPerPage| integer | Max product links to pull from each search result page. | ### 2. 🧠 Analysis & AI Control | Field | Type | Description | | :--- | :--- | :--- | | enableAISynthesis | boolean | If true (default): Runs the full LLM-based structured extraction and synthesis (Tier 2). | | promptSelection | string (select) | Defines the analysis objective (e.g., core_summary, technical_specs, customer_sentiment, or custom_input). | | customPrompt | string (textarea) | Used by the LLM when custom_input is selected (e.g., "Extract the screen size and processor model."). | | llmModel | string (select) | Selects the GPT model (e.g., gpt-4o-mini, gpt-4o, gpt-5) for all extraction and synthesis tasks. | | verboseLog | boolean | Enables detailed debug logging for troubleshooting. | --- ## 📊 Output Structure The Actor pushes multiple JSON objects to the default Dataset, ensuring a comprehensive output: ### Item 1: Final Synthesis Report (_tier: AI_SYNTHESIS_REPORT) This is the single aggregate summary of all products processed for the original query. | Field | Description | | :--- | :--- | | report | The comprehensive, synthesized final business summary generated by the LLM. | | sources | Array of all product URLs used in the report. | | extra_specs_json | A single JSON string summarizing the most common miscellaneous specifications found across all products. | ### Subsequent Items: Individual Product Reports (_tier: AI_SYNTHESIS) These contain the raw, structured data extracted from each successful product page. | Field | Description | | :--- | :--- | | product_title | The title of the product. | | asin | The product's ASIN. | | report | A short, human-readable summary of the structured data extracted for this specific product. | | core_data_point / price_with_currency / etc. | The specific structured data fields defined by the chosen analysis objective. | ### Fallback Items (_tier: CRAWL_ONLY_FALLBACK) These items are pushed if the LLM extraction fails (e.g., API error or Pydantic error), providing the raw HTML/Markdown content for manual review. --- ## 🛠️ Developer Notes * Model IDs: The _initialize_llm function automatically strips the redundant "openai/" prefix from the model name selected in the input UI to prevent Invalid Model ID errors when calling the OpenAI API. * Schema Handling: The scraper_logic.py dynamically selects and converts between Pydantic models (AmazonProductData, AmazonTechnicalSpecs, FinalReportSchema) and Python dictionaries using .model_dump() to ensure clean data flow and prevent Pydantic validation errors during aggregation. * Dependencies: The requirements.txt includes necessary asynchronous libraries (playwright, httpx) and the LangChain/OpenAI stack (langchain-openai) for robust execution.
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Amazon AI Product Intelligence now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- visita
- Pricing
- Paid
- Total Runs
- 117
- Active Users
- 3
Related Actors
Google Search Results Scraper
by apify
Website Content Crawler
by apify
🔥 Leads Generator - $3/1k 50k leads like Apollo
by microworlds
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support