Chatgpt Conversation Extractor

Name: Chatgpt Conversation Extractor
Author: klinzinger

by klinzinger

This scraper extracts the conversation history from public ChatGPT conversations

18 runs

4 users

Try This Actor

Opens on Apify.com

About Chatgpt Conversation Extractor

This scraper extracts the conversation history from public ChatGPT conversations

What does this actor do?

Chatgpt Conversation Extractor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

ChatGPT Conversation Extractor An Apify Actor that extracts conversation data from publicly shared ChatGPT conversations. The Actor navigates to shared conversation URLs and extracts the full conversation history including all messages, timestamps, and metadata. ## Overview This Actor extracts conversation data from ChatGPT's publicly shared conversations by accessing the data embedded in the page through React Router's data loader. The data is not fetched via a separate API endpoint but is embedded server-side and accessible through the browser's JavaScript environment. ## How It Works 1. The Actor navigates to the provided ChatGPT share URLs using Puppeteer 2. Waits for the page to fully load and React Router to initialize 3. Extracts conversation data from `window.__reactRouterDataRouter.state.loaderData` 4. Parses the conversation tree structure into a linear array of messages 5. Outputs structured data including: - Conversation metadata (title, timestamps, share ID) - Parsed messages in chronological order - Optionally, the complete raw conversation data ## Input The Actor accepts the following input parameters: - startUrls (required): Array of ChatGPT share URLs to extract - Example: `https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51` - includeRawData (optional, default: `true`): Whether to include the complete raw conversation data in the output ### Example Input `json { "startUrls": [ { "url": "https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51" } ], "includeRawData": true }` ## Output The Actor outputs structured data to the dataset with the following fields: - url: The ChatGPT share URL - shareId: Extracted share ID from the URL - title: Conversation title - createTime: Unix timestamp when conversation was created - updateTime: Unix timestamp when conversation was last updated - messageCount: Number of messages in the conversation - messages: Array of parsed messages, each containing: - `role`: Message role ("user" or "assistant") - `content`: Message content text - `timestamp`: Unix timestamp when message was created - `messageId`: Unique message identifier - `status`: Message status - rawData (if `includeRawData` is true): Complete raw conversation data with full tree structure ### Example Output json { "url": "https://chatgpt.com/share/693011c8-0a3c-8006-b6cf-77d844d1bb51", "shareId": "693011c8-0a3c-8006-b6cf-77d844d1bb51", "title": "Example Conversation", "createTime": 1764757960.044993, "updateTime": 1764757965.106983, "messageCount": 54, "messages": [ { "role": "user", "content": "Hello, how are you?", "timestamp": 1764256500.3946629, "messageId": "message_id_1", "status": "finished_successfully" }, { "role": "assistant", "content": "I'm doing well, thank you!", "timestamp": 1764256501.1234567, "messageId": "message_id_2", "status": "finished_successfully" } ], "rawData": { /* complete raw conversation data */ } } ## Data Structure ChatGPT conversations are stored in a tree structure where: - Each message has a `parent` reference to its parent message - Each message has a `children` array with child message IDs - Messages are organized in threads/branches - The Actor traverses this tree to extract messages in chronological order ## Limitations - Only works for publicly shared conversations - Requires JavaScript execution (uses Puppeteer browser automation) - Cannot access private conversations without authentication - Data structure may change as ChatGPT updates their platform - Rate limiting may apply if extracting many conversations ## Use Cases - Archiving publicly shared conversations - Analyzing conversation patterns and structures - Converting conversations to other formats (Markdown, CSV, etc.) - Building conversation datasets for training or analysis - Creating backups of shared conversations - Research and analysis of AI conversation patterns ## Getting Started ### Local Development 1. Install dependencies: `bash npm install` 2. Run the Actor locally: `bash apify run` The Actor will read input from `storage/key_value_stores/default/INPUT.json`. Create this file with your ChatGPT share URLs: `json { "startUrls": [ { "url": "https://chatgpt.com/share/YOUR_SHARE_ID" } ] }` ### Deploy to Apify 1. Log in to Apify: `bash apify login` 2. Deploy your Actor: `bash apify push` ## Technical Details ### Extraction Method The Actor uses the following approach to extract conversation data: 1. Page Navigation: Uses Puppeteer to navigate to the ChatGPT share URL 2. Wait for React Router: Waits for `window.reactRouterDataRouter` to be available 3. Data Extraction: Accesses the conversation data from: `javascript window.reactRouterDataRouter.state.loaderData['routes/share.$shareId.($action)'].serverResponse.data` 4. Tree Traversal: Parses the conversation tree structure by: - Finding the root message (message without a parent) - Traversing the tree recursively through children - Extracting messages in chronological order ### Error Handling If extraction fails, the Actor will: - Log detailed error information - Push error data to the dataset for debugging - Continue processing other URLs if multiple are provided ## Resources - Apify SDK Documentation - Crawlee Documentation - Puppeteer Documentation - Apify Platform Documentation ## License ISC

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Chatgpt Conversation Extractor now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: klinzinger
Pricing: Paid
Total Runs: 18
Active Users: 4

Related Actors

Google Search Results Scraper

by apify

Website Content Crawler

by apify

🔥 Leads Generator - $3/1k 50k leads like Apollo

by microworlds

Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.

by invideoiq

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support