Dataset2GPT
by flamboyant_leaf
Opens on Apify.com
About Dataset2GPT
What does this actor do?
Dataset2GPT is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Dataset2GPT Dataset2GPT reads any dataset on Apify and uses GPT-based AI to perform a variety of text-processing tasks—such as summarization, analysis, classification, or transformation. Whether you want a comprehensive 2,000-word synopsis, key insights, sentiment analysis, or custom NLP transformations, Dataset2GPT can handle it. > For a complete tutorial on how to integrate Dataset2GPT to analyse Reddit comments in bulk, please see our tutorial using our gpt-powered summary actor here. ## Features - GPT-Powered Processing using GPT-4o-mini for best price/performance ratio - Flexible Output Length: Control how long or detailed the result should be (100–2,000 words or more) - Task Focus: Provide a specific topic or angle for the AI to focus on (e.g., “marketing analysis,” “customer feedback,” “sentiment extraction”) - Token Usage & Cost Tracking: Monitor how many tokens the AI uses and the associated cost - PDF Export with Markdown formatting for easy sharing - Optional Email Delivery of processed results ## How It Works 1. Input: Your Apify actor collects or generates data in a dataset. 2. Processing: Dataset2GPT reads the dataset content as plain text, then uses GPT to perform your desired task. 3. AI Output: The AI generates the requested output (e.g., summary, analysis, transformation). 4. Delivery: Retrieve the output in multiple formats (dataset record, PDF, email). ## Benefits - Rapid Insights: Quickly extract meaning or structure from large datasets. - Plain Text Processing: No complex parsing required—just feed text. - Seamless Integration: Works with any Apify actor that outputs to a dataset. - Automation: Set up once and let Dataset2GPT handle the heavy lifting. ## Perfect For - Market Researchers - Social Media Managers - Data Scientists - Business Intelligence Teams - Content Analysts - Anyone working with scraped or collected data ## Input Parameters | Parameter | Type | Required | Default | Description | |--------------|---------|----------|---------|-------------------------------------------------------------------------------------------------| | datasetId | string | No | - | Dataset ID for testing. When used as an integration, dataset ID is handled automatically | | openaiApiKey | string | Yes | - | Your OpenAI API key for GPT-based processing | | summaryLength| integer | No | 1000 | Desired length of output in words (100–2000). You can also use it for controlling output detail | | focusTopic | string | No | - | Specific topic or angle to guide the AI’s processing (e.g., “sentiment analysis” or “key points”)| | targetEmail | string | No | - | Email address to send the results to (requires EMAIL_SUPPORT=true) | ## Output Formats 1. Dataset Record json { "output": "The AI-generated text (e.g., summary, analysis, transformations)...", "tokenUsage": { "promptTokens": 1234, "completionTokens": 567, "totalTokens": 1801 }, "costs": { "promptCost": 0.001234, "completionCost": 0.000567, "totalCost": 0.001801 } } - The output field contains the GPT-processed text. - Token usage and cost breakdown are provided for transparency. 2. PDF Output - Stored in a key-value store under the OUTPUT key. - Includes Markdown formatting for better readability. - Downloadable via the Apify console or API. 3. Email Delivery (Optional) - Sends the GPT output in HTML and plain text formats. - Requires EMAIL_SUPPORT=true and a valid targetEmail. ## Getting Started 1. Add Dataset2GPT to Your Apify Actor: For example, if you have an actor scraping Reddit or Twitter, attach Dataset2GPT as an integration step. 2. Use the Default Dataset: In your input configuration, set "datasetId": "{{resource.defaultDatasetId}}" to automatically consume the scraped data. 3. Run Your Apify Actor: Once it finishes, Dataset2GPT will read the dataset and produce the requested AI output. 4. Retrieve Your Results: - Download from the dataset. - Grab the PDF from the key-value store. - Or check your inbox if you enabled email delivery. ## Usage Examples ### As an Integration (Recommended) json { "openaiApiKey": "your-openai-api-key", "summaryLength": 1500, "focusTopic": "product feedback analysis" } > Here, Dataset2GPT will generate a ~1500-word analysis focusing on product feedback. ### For Testing json { "datasetId": "your-dataset-id", "openaiApiKey": "your-openai-api-key", "summaryLength": 1000 } > Useful for local testing. Specify a known datasetId. ### With Email Delivery json { "openaiApiKey": "your-openai-api-key", "targetEmail": "user@example.com", "focusTopic": "sentiment extraction" } > Generates text focusing on sentiments and sends it to user@example.com. ## Environment Variables - EMAIL_SUPPORT: Enable/disable email functionality (default: true) - MAX_PARALLEL_REQUESTS: Controls parallel API requests (default: 10) - PROMPT_TOKEN_COST: Cost per 1M prompt tokens (default: 0.150) - COMPLETION_TOKEN_COST: Cost per 1M completion tokens (default: 0.075) --- With Dataset2GPT, turn any dataset into meaningful insights, summaries, transformations, or advanced NLP tasks—powered by GPT.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Dataset2GPT now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- flamboyant_leaf
- Pricing
- Paid
- Total Runs
- 293
- Active Users
- 3
Related Actors
Google Search Results Scraper
by apify
Website Content Crawler
by apify
🔥 Leads Generator - $3/1k 50k leads like Apollo
by microworlds
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support