Sora Scraper

Name: Sora Scraper
Author: lexis-solutions

by lexis-solutions

Discover AI-generated video insights from OpenAI’s Sora 2 community—extract posts, media, user profiles, comments, and engagement metrics. Perfect for...

4,847 runs

97 users

Try This Actor

Opens on Apify.com

About Sora Scraper

Discover AI-generated video insights from OpenAI’s Sora 2 community—extract posts, media, user profiles, comments, and engagement metrics. Perfect for trend analysis, content curation, influencer tracking, and research. Fast, reliable, and fully customizable.

What does this actor do?

Sora Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Sora Scraper The Sora Scraper is an Apify actor for OpenAI's Sora 2 — the next generation of AI video creation platform. Extract posts, videos, engagement metrics, user profiles, and comments from Sora's community with unprecedented ease. Watch our video tutorial on how to use Sora Scraper: Video tutorial --- ## ✨ Key Features - 🚀 First-to-Market: The only scraper built specifically for Sora 2. - 🎬 Media Downloads: Download videos, thumbnails, and GIFs directly from posts. - 💬 Comment Extraction: Capture detailed comment threads with engagement data. - 👤 Rich Profile Data: Extract complete user profiles including followers, verified status, and more. - 📊 Engagement Metrics: Views, unique views, likes, remixes, replies, and recursive replies. - 🔍 Flexible Search: Query any topic and discover Sora's creative community. - ⚙️ Granular Control: Configure comment limits, media downloads, and result counts. - 📦 Structured Output: Normalized JSON data ready for analysis and integration. --- ## 💡 Why It's Important Sora represents OpenAI's breakthrough in AI-generated video content, and Sora 2 is the latest evolution. With this scraper, you can: - Monitor trending content in the AI video generation space. - Analyze user engagement patterns and viral content characteristics. - Archive creative works for research, inspiration, or competitive analysis. - Track community growth and user interactions in real-time. - Build datasets for AI content analysis, sentiment studies, and trend forecasting. --- ## 👤 Who Is It For? - AI Researchers studying generative video trends and user behavior. - Content Creators seeking inspiration and understanding what resonates. - Marketing Agencies monitoring brand mentions and creative trends. - Data Scientists building datasets for machine learning and analytics. - Media Companies tracking viral content and emerging creators. - Developers building applications around AI-generated content. --- ## 🚀 Business Use Cases - Trend Analysis: Identify viral prompts, themes, and creative patterns. - Content Curation: Aggregate and showcase top Sora creations. - Competitive Intelligence: Track how competitors use AI video generation. - Influencer Discovery: Find trending creators and their engagement rates. - Brand Monitoring: Track mentions and sentiment around your brand. - Research & Development: Build datasets for AI content analysis. - Market Research: Understand user preferences in AI-generated content. --- ## 🛠 Input Schema The actor accepts the following input: Example 1 (query-based search): `json { "query": "SpongeBob", "numOfComments": 10, "downloadVideo": false, "downloadThumbnail": false, "downloadGIF": false, "maxItems": 10, "proxyConfiguration": { "useApifyProxy": false } }` Example 2 (direct startUrls): `json { "startUrls": [ "https://sora.chatgpt.com/p/s_68de52e8be348191b268bdad202ea36a" ], "numOfComments": 5, "downloadVideo": true, "downloadThumbnail": true, "downloadGIF": false, "maxItems": 1, "proxyConfiguration": { "useApifyProxy": true } }` ### Input Parameters | Parameter | Type | Required | Description | | -------------------- | ------- | -------- | ------------------------------------------------------------------------------------------------ | | `query` | string | No | Search query to find Sora posts (e.g., "SpongeBob", "sunset timelapse") | | `startUrls` | array | No | Detail post URLs in the format `https://sora.chatgpt.com/p/s_{{id}}`; any other format is ignored | | `numOfComments` | integer | No | Maximum number of comments to extract per post (default: 10) | | `downloadVideo` | boolean | No | Download video files to key-value store. Key stored in `videoStoreKey` (default: false) | | `downloadThumbnail` | boolean | No | Download thumbnail images to key-value store. Key stored in `thumbnailStoreKey` (default: false) | | `downloadGIF` | boolean | No | Download GIF previews to key-value store. Key stored in `gifStoreKey` (default: false) | | `maxItems` | integer | No | Maximum number of posts to scrape (default: 10) | | `proxyConfiguration` | object | No | Apify proxy configuration for requests | Notes: - Required input: At least one of `query` or `startUrls` must be provided to run the scraper. - Start URLs: Only detail URLs in the format `https://sora.chatgpt.com/p/s_{{id}}` are processed; other URLs are ignored. - Media Downloads: Enabling video, thumbnail, or GIF downloads will increase run time but provide direct access to media files in the key-value store. - Comments: Set `numOfComments` to control how many comments are extracted per post. Comments include full profile data and engagement metrics. - Performance: Higher `maxItems` and media downloads will consume more resources and time. --- ## 📦 Output Schema Each dataset item contains comprehensive post data: json { "id": "s_68dca5d7d4ac8191987e9c6393d498d4", "text": "spongebob as a ww2 leader speaking about the scourge of fish ruining bikini bottom wearing axis power uniform", "caption": null, "link": "https://sora.chatgpt.com/p/s_68dca5d7d4ac8191987e9c6393d498d4", "coverUrl": "https://videos.openai.com/vg-assets/...", "gifUrl": "https://videos.openai.com/vg-assets/...", "postedAt": 1759290839.830908, "updatedAt": 1759936985.530838, "likes": 1289, "replies": 43, "views": 39477, "uniqueViews": 24713, "remixes": 76, "recursiveReplies": 70, "dislikeCount": 0, "workspaceId": null, "postedToPublic": true, "emoji": "🧽", "attachments": [ { "id": "s_68dca5d7d4ac8191987e9c6393d498d4-attachment-0", "title": "New Video", "url": "https://sdmntprsouthcentralus.oaiusercontent.com/files/...", "downloadableUrl": "https://sdmntprsouthcentralus.oaiusercontent.com/files/...", "thumbnail": "https://videos.openai.com/vg-assets/...", "gif": "https://videos.openai.com/vg-assets/...", "width": 352, "height": 640, "generationId": "gen_01k6eyadhqezmskzd31pp2n2xm", "generationType": "video_gen" } ], "profile": { "id": "user-vmw00GfT7mSYdcIST7bLbwCF", "username": "jakeleventhal", "displayName": "Jake Leventhal", "profilePictureUrl": "https://sdmntprnorthcentralus.oaiusercontent.com/files/...", "coverPhotoUrl": null, "link": "https://sora.chatgpt.com/profile/jakeleventhal", "verified": false, "followerCount": 2664, "followingCount": 7, "postCount": 61, "replyCount": 0, "likesReceivedCount": 22854, "remixCount": 1606, "cameoCount": 33, "isBlocked": false, "followedBy": [], "planType": null, "createdAt": 1753852741.285583, "updatedAt": 1759951105.520806, "bannedAt": null, "calpicoIsEnabled": true, "soraWhoCanMessageMe": "followees_only", "isPublicFigure": false, "location": null, "description": null, "birthday": null, "website": null }, "videoStoreKey": "s_68dca5d7d4ac8191987e9c6393d498d4_video_0.mp4", "thumbnailStoreKey": "s_68dca5d7d4ac8191987e9c6393d498d4_thumbnail_0.webp", "gifStoreKey": "s_68dca5d7d4ac8191987e9c6393d498d4_gif_0.gif", "comments": [ { "id": "68dcb87374948191bc6c9f88b5ea723e", "text": "Ts gonna be the reason Viacom gonna shut this down😭😭", "caption": null, "postedAt": 1759295603.455459, "updatedAt": 1759530609.903473, "likes": 16, "parentPostId": "s_68dca5d7d4ac8191987e9c6393d498d4", "rootPostId": "s_68dca5d7d4ac8191987e9c6393d498d4", "postUrl": "https://sora.chatgpt.com/p/s_68dca5d7d4ac8191987e9c6393d498d4", "profile": { "id": "user-PDq6JrFlZ0qjFVKrdeAmiTnh", "username": "skipppz", "displayName": "C", "profilePictureUrl": "https://cdn.openai.com/sora/images/profile_placeholder_v4.png", "verified": false, "followerCount": 1, "followingCount": 2, "postCount": 9, "replyCount": 9, "likesReceivedCount": 98, "remixCount": 2, "cameoCount": 0 } } ] } ### Output Fields Explained #### Post Data - `id`: Unique post identifier - `text`: The prompt/description used to generate the video - `caption`: Optional caption text - `link`: Direct link to the post on Sora - `coverUrl`: URL to the cover image - `gifUrl`: URL to the animated GIF preview - `emoji`: Associated emoji for the post #### Engagement Metrics - `likes`: Number of likes - `replies`: Direct reply count - `views`: Total view count - `uniqueViews`: Unique viewer count - `remixes`: Number of times the video was remixed - `recursiveReplies`: Total replies including nested threads - `dislikeCount`: Number of dislikes #### Timestamps - `postedAt`: Unix timestamp when post was created - `updatedAt`: Unix timestamp of last update #### Attachments - `id`: Attachment identifier - `title`: Attachment title - `url`: Direct video URL - `downloadableUrl`: URL for downloading - `thumbnail`: Thumbnail image URL - `gif`: GIF preview URL - `width` / `height`: Video dimensions - `generationId`: Sora generation ID - `generationType`: Type of generation (e.g., "video_gen") #### Profile Data Complete user profile including: - Username, display name, profile picture - Verification status - Follower/following counts - Post and reply counts - Likes received, remix count, cameo count - Account creation and update timestamps - Privacy settings and location #### Downloaded Media Keys - `videoStoreKey`: Key-value store key for downloaded video (provided only when `downloadVideo` is enabled) - `thumbnailStoreKey`: Key-value store key for downloaded thumbnail (provided only when `downloadThumbnail` is enabled) - `gifStoreKey`: Key-value store key for downloaded GIF (provided only when `downloadGIF` is enabled) #### Comments Array of comment objects with: - Comment text and timestamps - Like counts - Parent and root post IDs - Full profile data for commenter - Post URL for context --- ## 🎯 Advanced Features ### Media Download System When you enable media downloads (`downloadVideo`, `downloadThumbnail`, or `downloadGIF`), files are automatically saved to Apify's key-value store with predictable keys: - Videos: `{postId}_video_{index}.mp4` - Thumbnails: `{postId}_thumbnail_{index}.webp` - GIFs: `{postId}_gif_{index}.gif` Access downloaded files programmatically or through the Apify console's key-value store tab. ### Comment Threading Comments maintain parent-child relationships through `parentPostId` and `rootPostId` fields, allowing you to reconstruct conversation threads. Each comment includes: - Full commenter profile - Engagement metrics (likes) - Timestamps for tracking conversation flow ### Engagement Analytics Track multiple engagement dimensions: - Virality: `views` and `uniqueViews` show reach - Interaction: `likes`, `replies`, and `recursiveReplies` measure engagement depth - Creativity: `remixes` show how content inspires others - Trend tracking: Compare metrics across posts to identify patterns --- ## 🔧 Best Practices 1. Start Small: Test with `maxItems: 10` to understand output structure before scaling. 2. Media Downloads: Only enable media downloads when necessary — they significantly increase run time. 3. Comment Limits: Adjust `numOfComments` based on your needs. High-engagement posts can have hundreds of comments. 4. Proxy Configuration: Use Apify proxies for reliable access and to respect rate limits. --- ## 🌟 Why Choose Our Sora Scraper? ✅ First to Market — The only Sora 2 scraper available ✅ Comprehensive Data — Posts, profiles, comments, engagement metrics ✅ Media Support — Download videos, thumbnails, and GIFs ✅ Production Ready — Structured output, error handling, proxy support ✅ Well Maintained — Regular updates as Sora evolves ✅ Expert Support — Backed by certified Apify Partners --- 👀 p.s. Got feedback or need an extension? Lexis Solutions is a certified Apify Partner. We can help you with custom solutions or data extraction projects. Contact us over Email or LinkedIn ## Support Our Work 💝 If you're happy with our work and scrapers, you're welcome to leave us a company review here and leave a review for the scrapers you're subscribed to. It will take you less than a minute but it will mean a lot to us! Image Credit: https://sora.chatgpt.com/

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Sora Scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: lexis-solutions
Pricing: Paid
Total Runs: 4,847
Active Users: 97

Related Actors

🏯 Tweet Scraper V2 - X / Twitter Scraper

by apidojo

Instagram Scraper

by apify

TikTok Scraper

by clockworks

Instagram Profile Scraper

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support