Polars AI Data Transformer
by salesmart-srl
Transform datasets using natural language. Upload CSV/Excel/JSON, describe your transformation in plain English, get results + reusable Python code. P...
Opens on Apify.com
About Polars AI Data Transformer
Transform datasets using natural language. Upload CSV/Excel/JSON, describe your transformation in plain English, get results + reusable Python code. Powered by AI.
What does this actor do?
Polars AI Data Transformer is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
AI Data Transformer Transform any dataset using plain English. No coding required. Describe what you want in natural language, get transformed data + reusable Python code. --- ## Table of Contents 1. Quick Start 2. Getting Your Apify API Token 3. Pricing 4. Choosing the Right Mode 5. 4 Operating Modes 6. Complete Input Options 7. How It Works 8. Writing Effective Prompts 9. API Examples 10. Output Format --- ## Quick Start Option 1: With file URL bash curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_APIFY_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "datasetUrls": ["https://example.com/data.csv"], "prompt": "Group by country and sum sales" }' Option 2: Direct JSON data (no file needed!) bash curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "inputData": [ {"product": "iPhone", "price": 999, "qty": 10}, {"product": "iPad", "price": 799, "qty": 5} ], "prompt": "Calculate total value (price * qty) for each product" }' Response includes output_data with all transformed rows directly in JSON. That's it. No LLM API key needed for Basic mode. --- ## Getting Your Apify API Token To use this Actor via API, you need an Apify API token. ### Step 1: Create Apify Account Go to apify.com and sign up (free). ### Step 2: Get Your API Token 1. Log in to Apify Console 2. Click your profile icon (top right) 3. Go to Settings → Integrations 4. Copy your Personal API Token Your token looks like: apify_api_xxxxxxxxxxxxxxxxxxxxx ### Step 3: Use the Token Option A: Query parameter https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_TOKEN Option B: Authorization header Authorization: Bearer YOUR_TOKEN ### Free Tier Apify offers $5 free credits monthly. Basic transformations cost ~$0.0015 each, so you get ~3,000 free transformations per month. --- ## Pricing Pay-per-event pricing. You only pay when a transformation runs successfully. | Mode | Apify Fee | LLM Cost | Total Cost | |------|-----------|----------|------------| | Basic | $0.0015 | Included | $0.0015 | | Premium | $0.20 | Included | $0.20 | | BYOK | $0.001 | Your API | $0.001 + API | | BYOK Premium | $0.001 | Your API | $0.001 + API | ### Volume Discounts | Tier | BYOK | Basic | Premium | |------|------|-------|---------| | No Discount | $0.001 | $0.0015 | $0.20 | | Bronze | $0.0007 | $0.00117 | $0.167 | | Silver | $0.0004 | $0.00083 | $0.133 | | Gold | $0.0001 | $0.0005 | $0.10 | --- ## Choosing the Right Mode ### Decision Tree Do you need Google Search grounding or complex reasoning? │ ├─ NO → Do you have your own LLM API key? │ │ │ ├─ NO → Use BASIC ($0.0015) │ │ Simple, fast, no setup │ │ │ └─ YES → Use BYOK ($0.001 + your API) │ Lowest cost if you have free API credits │ └─ YES → Do you have a Google API key? │ ├─ NO → Use PREMIUM ($0.20) │ All features included, no setup │ └─ YES → Use BYOK PREMIUM ($0.001 + your API) Same features as Premium, use your credits ### Mode Comparison | Feature | Basic | Premium | BYOK | BYOK Premium | |---------|-------|---------|------|--------------| | Simple aggregations | Yes | Yes | Yes | Yes | | Filtering & sorting | Yes | Yes | Yes | Yes | | Data cleaning | Yes | Yes | Yes | Yes | | E-commerce migrations | Limited | Best | Limited | Best | | Google Search grounding | No | Yes | No | Yes | | Extended thinking/reasoning | No | Yes | No | Yes | | RAG memory (learns over time) | Yes | Yes | Yes | Yes | | Requires LLM API key | No | No | Yes | Yes | | Cost | $0.0015 | $0.20 | $0.001+API | $0.001+API | ### Premium vs BYOK Premium: What's the Difference? Nothing, except billing. Both modes use: - Gemini 2.5 Pro with extended thinking - Google Search grounding - RAG memory system The only difference: - Premium: We pay Google, you pay us $0.20 - BYOK Premium: You pay Google directly, you pay us $0.001 When to use BYOK Premium: - You have Google Cloud credits - You have an enterprise Google agreement - You want to track API usage in your own Google console --- ## 4 Operating Modes ### Mode 1: Basic (Hosted) Use when: Simple transformations, high volume, budget-conscious json { "datasetUrls": ["https://example.com/data.csv"], "prompt": "Group by country and sum sales, show top 10" } What you get: - Gemini 2.5 Flash-Lite (fast, efficient) - No API key required - $0.0015 per transformation Good for: - Aggregations (sum, count, average) - Filtering and sorting - Basic calculations - Data reformatting Not ideal for: - "Convert to Shopify format" (doesn't know Shopify schema) - Complex multi-step reasoning --- ### Mode 2: Premium (Hosted) Use when: Complex transformations, e-commerce migrations, need accuracy json { "datasetUrls": ["https://example.com/magento-products.csv"], "prompt": "Transform to Shopify product import format", "useAdvancedFeatures": true } What you get: - Gemini 2.5 Pro (most capable model) - Extended thinking (reasons through complex problems) - Google Search grounding (knows external formats) - RAG memory (improves over time) - No API key required - $0.20 per transformation Good for: - E-commerce platform migrations (Magento→Shopify, etc.) - Format conversions (to Stripe, Mailchimp, etc.) - Complex multi-step transformations - Tasks requiring external knowledge Why it costs more: - Uses Gemini Pro (~$1.25/1M tokens) - Google Search queries (~$35/1K queries) - We bundle these costs into a flat $0.20 fee --- ### Mode 3: BYOK (Bring Your Own Key) Use when: You have LLM API credits, want lowest cost json { "datasetUrls": ["https://example.com/data.csv"], "prompt": "Filter active users and calculate totals", "llmProvider": "groq", "groqApiKey": "gsk_..." } What you get: - Your choice of LLM provider - RAG memory (improves over time) - $0.001 Apify fee + your API costs Supported providers: | Provider | Model | API Cost | Get Key | |----------|-------|----------|---------| | Groq | Llama 3.3 70B | FREE tier | console.groq.com | | Google | Gemini 2.0 Flash | ~$0.10/1M tokens | aistudio.google.com | | OpenAI | GPT-4o | ~$5/1M tokens | platform.openai.com | | Anthropic | Claude Sonnet 4 | ~$3/1M tokens | console.anthropic.com | Recommended: Groq (FREE) Groq offers a generous free tier. Combined with our $0.001 fee, you can run thousands of transformations for almost nothing. --- ### Mode 4: BYOK Premium Use when: You have Google API credits AND need Premium features json { "datasetUrls": ["https://example.com/products.csv"], "prompt": "Convert to Shopify product CSV format", "llmProvider": "google", "googleApiKey": "AIza...", "useAdvancedFeatures": true } What you get: - Same as Premium: Gemini Pro + Google Search + RAG - Uses YOUR Google API key - $0.001 Apify fee + your Google API costs Your Google API costs: - Gemini 2.5 Pro: ~$1.25/1M input, ~$5/1M output tokens - Google Search grounding: ~$35 per 1,000 queries Why use this instead of Premium? - You have Google Cloud credits to use up - Your company has a Google enterprise agreement - You want API usage in your own Google console - You're doing very high volume and want direct billing --- ## Complete Input Options ### Required | Field | Type | Description | |-------|------|-------------| | prompt | string | Natural language description of transformation | ### Data Sources (at least one required) | Field | Type | Description | |-------|------|-------------| | inputData | array | Direct JSON data - no file hosting needed! | | datasetUrls | string[] | URLs to data files (CSV, Excel, JSON, Parquet) | | uploadedFiles | file[] | Direct file uploads via Apify Console | | apifyDatasetId | string | ID of existing Apify dataset | Recommended: inputData for API integrations - single call with data in, results out. ### Mode Selection | Field | Type | Default | Description | |-------|------|---------|-------------| | useAdvancedFeatures | boolean | false | Enable Premium features (reasoning + grounding) | | llmProvider | string | - | BYOK provider: groq, google, openai, anthropic | | groqApiKey | string | - | Your Groq API key | | googleApiKey | string | - | Your Google API key | | openaiApiKey | string | - | Your OpenAI API key | | anthropicApiKey | string | - | Your Anthropic API key | ### Output Options | Field | Type | Default | Description | |-------|------|---------|-------------| | outputFormat | string | csv | Output format: csv, json, parquet, xlsx | | includeGeneratedCode | boolean | true | Include Python code in output | | maxRetries | number | 3 | Max code generation retry attempts | ### Mode Selection Logic IF llmProvider is set AND corresponding API key is provided: IF useAdvancedFeatures is true AND llmProvider is "google": → BYOK PREMIUM (your Google key + Premium features) ELSE: → BYOK (your key, basic features) ELSE: IF useAdvancedFeatures is true: → PREMIUM (hosted, $0.20) ELSE: → BASIC (hosted, $0.0015) --- ## How It Works ### Processing Pipeline 1. INPUT VALIDATION ├─ Parse prompt and options ├─ Detect mode (Basic/Premium/BYOK) └─ Validate data URLs 2. DATA LOADING ├─ Load from inputData (direct JSON - zero I/O!) ├─ Or fetch from URLs (CSV, Excel, JSON, Parquet) ├─ Auto-detect format and encoding ├─ Extract schema (column names, types, sample values) └─ Handle multiple sources (auto-merge) 3. RAG SEARCH ├─ Search Pinecone for similar past transformations ├─ If found (>85% similarity), include as context └─ Helps LLM generate better code 4. CODE GENERATION ├─ Send prompt + schema + RAG context to LLM ├─ LLM generates Polars transformation code └─ Validate code structure 5. EXECUTION ├─ Execute code in sandboxed environment ├─ Validate output (no empty results, correct types) └─ Retry if errors (up to maxRetries) 6. OUTPUT ├─ Export transformed data (CSV/JSON/Parquet/Excel) ├─ Save generated code └─ Return metadata (rows, timing, etc.) 7. LEARNING ├─ Save successful transformation to Pinecone └─ Future similar requests benefit from this ### RAG Memory System The system learns from every successful transformation: 1. Before generation: Searches for similar prompts in Pinecone 2. If found: Includes similar code as context for better results 3. After success: Saves the new transformation 4. Over time: Accuracy improves as memory grows Current memory: 22+ successful transformations and growing. --- ## Writing Effective Prompts ### Structure [ACTION] + [COLUMNS] + [CONDITIONS] + [OUTPUT] ### Examples by Complexity Simple (use Basic): Group by 'region' column, sum 'revenue', sort descending, top 10 Medium (use Basic or Premium): Filter rows where status is 'active' and created_at > 2024-01-01, calculate total and average order_value per customer Complex (use Premium): Convert Magento 2 product export to Shopify CSV format: - sku -> Handle (lowercase, replace spaces with dashes) - name -> Title - description -> Body (HTML) - price -> Variant Price - qty -> Variant Inventory Qty - product_online -> Published (1=true, 0=false) Only include simple products (exclude configurable/bundle) Add Vendor column with value "Imported from Magento" ### Tips | Do | Don't | |----|-------| | Name specific columns | Say "transform the data" | | Specify output format | Assume system knows your schema | | Use Premium for migrations | Use Basic for Shopify/Stripe formats | | Break complex tasks into steps | Write 500-word prompts | --- ## API Examples ### Direct JSON Input (Recommended for API) Single call with data in, results out. No file hosting needed. bash curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/run-sync-get-dataset-items?token=YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "inputData": [ {"product": "iPhone", "price": 999, "quantity": 10}, {"product": "iPad", "price": 799, "quantity": 5}, {"product": "MacBook", "price": 1999, "quantity": 3} ], "prompt": "Calculate total_value = price * quantity, sort by total_value descending", "outputFormat": "json" }' Response includes output_data array with all transformed rows. ### Basic Mode (with URL) bash curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "datasetUrls": ["https://example.com/sales.csv"], "prompt": "Group by region, sum revenue, sort descending" }' ### Premium Mode bash curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "datasetUrls": ["https://example.com/magento-products.csv"], "prompt": "Transform to Shopify product import CSV format", "useAdvancedFeatures": true }' ### BYOK Mode (Groq - FREE) bash curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "datasetUrls": ["https://example.com/data.csv"], "prompt": "Calculate monthly trends", "llmProvider": "groq", "groqApiKey": "gsk_xxxxx" }' ### BYOK Premium Mode bash curl -X POST "https://api.apify.com/v2/acts/salesmart-srl~polars-ai-data-transformer/runs?token=YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "datasetUrls": ["https://example.com/products.csv"], "prompt": "Convert to Shopify format with all required columns", "llmProvider": "google", "googleApiKey": "AIza_xxxxx", "useAdvancedFeatures": true }' ### Python SDK python from apify_client import ApifyClient client = ApifyClient("YOUR_APIFY_TOKEN") # Direct JSON input (recommended) run = client.actor("salesmart-srl/polars-ai-data-transformer").call( run_input={ "inputData": [ {"product": "iPhone", "price": 999, "qty": 10}, {"product": "iPad", "price": 799, "qty": 5}, ], "prompt": "Calculate total = price * qty, sort descending", } ) # Get results from dataset (includes output_data) dataset = client.dataset(run["defaultDatasetId"]) items = list(dataset.iterate_items()) result = items[0] print(f"Status: {result['status']}") print(f"Output rows: {result['output_rows']}") print(f"Transformed data: {result['output_data']}") # Full data! print(f"Generated code: {result['generated_code']}") # With file URL run = client.actor("salesmart-srl/polars-ai-data-transformer").call( run_input={ "datasetUrls": ["https://example.com/data.csv"], "prompt": "Group by category and sum sales", } ) # Premium transformation run = client.actor("salesmart-srl/polars-ai-data-transformer").call( run_input={ "datasetUrls": ["https://example.com/products.csv"], "prompt": "Convert to Shopify product CSV", "useAdvancedFeatures": True, } ) --- ## Output Format ### Response Structure json { "status": "success", "input_sources_count": 1, "input_rows_total": 1000, "input_columns": ["sku", "name", "price", "qty"], "output_rows": 50, "output_columns": ["Handle", "Title", "Variant Price"], "output_file": "transformed_data.csv", "execution_time_ms": 1234, "generation_info": { "provider": "google_pro", "tokens_used": 4500, "generation_time_ms": 890, "attempts": 1 }, "generated_code": "import polars as pl\n\nresult = ...", "output_preview": [ {"Handle": "product-1", "Title": "Product One", "Variant Price": 29.99} ], "output_data": [ {"Handle": "product-1", "Title": "Product One", "Variant Price": 29.99}, {"Handle": "product-2", "Title": "Product Two", "Variant Price": 49.99} ], "warnings": [], "errors": [] } ### Output Fields | Field | Description | |-------|-------------| | output_preview | First 10 rows (always present) | | output_data | Full transformed data (if < 10MB) - use this for API integrations! | | output_file | Filename in Key-Value Store (for large files) | ### Generated Code Every transformation returns reusable Python code: python import polars as pl # Load your data df = pl.read_csv("your_data.csv") # Generated transformation (copy this!) result = ( df.lazy() .filter(pl.col("status") == "active") .group_by("region") .agg( pl.col("revenue").sum().alias("total_revenue"), pl.col("orders").count().alias("order_count") ) .sort("total_revenue", descending=True) .head(10) .collect() ) # Save result.write_csv("output.csv") --- ## Performance - Handles millions of rows efficiently - Typical transformation: 1-3 seconds - Uses Polars (Rust-based, 10-100x faster than Pandas) - Lazy evaluation for memory efficiency - Parallel processing for multi-file inputs --- ## Privacy and Security - Encrypted: API keys encrypted with AES-256 - Isolated: Data processed in isolated containers - No retention: Data deleted after run completion - No training: Your data is never used to train models - BYOK: Full control over your LLM API keys --- ## Support - Issues: GitHub Issues - Actor page: Apify Store --- ## Changelog ### v0.4 (December 2024) - NEW: inputData - Pass data directly as JSON, no file hosting needed - NEW: output_data - Full transformed data in response (if < 10MB) - Single API call: data in, results out - Perfect for API integrations and automation ### v0.3 (December 2024) - Migrated to google-genai SDK - ThinkingConfig for extended reasoning - Improved Google Search grounding - Code cleanup and optimization ### v0.2 (December 2024) - 4-tier pricing: Basic, Premium, BYOK, BYOK Premium - Premium: Gemini Pro + Google Search + RAG - RAG system with Pinecone - Multi-file support ### v0.1 (December 2024) - Initial release - Multi-provider LLM support - CSV, Excel, JSON, Parquet I/O
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Polars AI Data Transformer now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- salesmart-srl
- Pricing
- Paid
- Total Runs
- 285
- Active Users
- 2
Related Actors
Google Search Results Scraper
by apify
Website Content Crawler
by apify
🔥 Leads Generator - $3/1k 50k leads like Apollo
by microworlds
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support