Chatgpt Conversation Extractor
by klinzinger
This scraper extracts the conversation history from public ChatGPT conversations
Opens on Apify.com
About Chatgpt Conversation Extractor
This scraper extracts the conversation history from public ChatGPT conversations
What does this actor do?
Chatgpt Conversation Extractor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
ChatGPT Conversation Extractor An Apify Actor that extracts conversation data from publicly shared ChatGPT conversations. The Actor navigates to shared conversation URLs and extracts the full conversation history including all messages, timestamps, and metadata. ## Overview This Actor extracts conversation data from ChatGPT's publicly shared conversations by accessing the data embedded in the page through React Router's data loader. The data is not fetched via a separate API endpoint but is embedded server-side and accessible through the browser's JavaScript environment. ## How It Works 1. The Actor navigates to the provided ChatGPT share URLs using Puppeteer 2. Waits for the page to fully load and React Router to initialize 3. Extracts conversation data from window.__reactRouterDataRouter.state.loaderData 4. Parses the conversation tree structure into a linear array of messages 5. Outputs structured data including: - Conversation metadata (title, timestamps, share ID) - Parsed messages in chronological order - Optionally, the complete raw conversation data ## Input The Actor accepts the following input parameters: - startUrls (required): Array of ChatGPT share URLs to extract - Example: https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51 - includeRawData (optional, default: true): Whether to include the complete raw conversation data in the output ### Example Input json { "startUrls": [ { "url": "https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51" } ], "includeRawData": true } ## Output The Actor outputs structured data to the dataset with the following fields: - url: The ChatGPT share URL - shareId: Extracted share ID from the URL - title: Conversation title - createTime: Unix timestamp when conversation was created - updateTime: Unix timestamp when conversation was last updated - messageCount: Number of messages in the conversation - messages: Array of parsed messages, each containing: - role: Message role ("user" or "assistant") - content: Message content text - timestamp: Unix timestamp when message was created - messageId: Unique message identifier - status: Message status - rawData (if includeRawData is true): Complete raw conversation data with full tree structure ### Example Output json { "url": "https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51", "shareId": "693011c8-0a3c-8006-b6cf-77d844d1bb51", "title": "Example Conversation", "createTime": 1764757960.044993, "updateTime": 1764757965.106983, "messageCount": 54, "messages": [ { "role": "user", "content": "Hello, how are you?", "timestamp": 1764256500.3946629, "messageId": "message_id_1", "status": "finished_successfully" }, { "role": "assistant", "content": "I'm doing well, thank you!", "timestamp": 1764256501.1234567, "messageId": "message_id_2", "status": "finished_successfully" } ], "rawData": { /* complete raw conversation data */ } } ## Data Structure ChatGPT conversations are stored in a tree structure where: - Each message has a parent reference to its parent message - Each message has a children array with child message IDs - Messages are organized in threads/branches - The Actor traverses this tree to extract messages in chronological order ## Limitations - Only works for publicly shared conversations - Requires JavaScript execution (uses Puppeteer browser automation) - Cannot access private conversations without authentication - Data structure may change as ChatGPT updates their platform - Rate limiting may apply if extracting many conversations ## Use Cases - Archiving publicly shared conversations - Analyzing conversation patterns and structures - Converting conversations to other formats (Markdown, CSV, etc.) - Building conversation datasets for training or analysis - Creating backups of shared conversations - Research and analysis of AI conversation patterns ## Getting Started ### Local Development 1. Install dependencies: bash npm install 2. Run the Actor locally: bash apify run The Actor will read input from storage/key_value_stores/default/INPUT.json. Create this file with your ChatGPT share URLs: json { "startUrls": [ { "url": "https://chatgpt.com/share/YOUR_SHARE_ID" } ] } ### Deploy to Apify 1. Log in to Apify: bash apify login 2. Deploy your Actor: bash apify push ## Technical Details ### Extraction Method The Actor uses the following approach to extract conversation data: 1. Page Navigation: Uses Puppeteer to navigate to the ChatGPT share URL 2. Wait for React Router: Waits for window.__reactRouterDataRouter to be available 3. Data Extraction: Accesses the conversation data from: javascript window.__reactRouterDataRouter.state.loaderData['routes/share.$shareId.($action)'].serverResponse.data 4. Tree Traversal: Parses the conversation tree structure by: - Finding the root message (message without a parent) - Traversing the tree recursively through children - Extracting messages in chronological order ### Error Handling If extraction fails, the Actor will: - Log detailed error information - Push error data to the dataset for debugging - Continue processing other URLs if multiple are provided ## Resources - Apify SDK Documentation - Crawlee Documentation - Puppeteer Documentation - Apify Platform Documentation ## License ISC
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Chatgpt Conversation Extractor now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- klinzinger
- Pricing
- Paid
- Total Runs
- 18
- Active Users
- 4
Related Actors
Google Search Results Scraper
by apify
Website Content Crawler
by apify
🔥 Leads Generator - $3/1k 50k leads like Apollo
by microworlds
Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.
by invideoiq
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support