Sora Scraper
by lexis-solutions
Discover AI-generated video insights from OpenAI’s Sora 2 community—extract posts, media, user profiles, comments, and engagement metrics. Perfect for...
Opens on Apify.com
About Sora Scraper
Discover AI-generated video insights from OpenAI’s Sora 2 community—extract posts, media, user profiles, comments, and engagement metrics. Perfect for trend analysis, content curation, influencer tracking, and research. Fast, reliable, and fully customizable.
What does this actor do?
Sora Scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Sora Scraper
The Sora Scraper is an Apify actor for OpenAI's Sora 2 — the next generation of AI video creation platform. Extract posts, videos, engagement metrics, user profiles, and comments from Sora's community with unprecedented ease. Watch our video tutorial on how to use Sora Scraper: Video tutorial --- ## ✨ Key Features - 🚀 First-to-Market: The only scraper built specifically for Sora 2. - 🎬 Media Downloads: Download videos, thumbnails, and GIFs directly from posts. - 💬 Comment Extraction: Capture detailed comment threads with engagement data. - 👤 Rich Profile Data: Extract complete user profiles including followers, verified status, and more. - 📊 Engagement Metrics: Views, unique views, likes, remixes, replies, and recursive replies. - 🔍 Flexible Search: Query any topic and discover Sora's creative community. - ⚙️ Granular Control: Configure comment limits, media downloads, and result counts. - 📦 Structured Output: Normalized JSON data ready for analysis and integration. --- ## 💡 Why It's Important Sora represents OpenAI's breakthrough in AI-generated video content, and Sora 2 is the latest evolution. With this scraper, you can: - Monitor trending content in the AI video generation space. - Analyze user engagement patterns and viral content characteristics. - Archive creative works for research, inspiration, or competitive analysis. - Track community growth and user interactions in real-time. - Build datasets for AI content analysis, sentiment studies, and trend forecasting. --- ## 👤 Who Is It For? - AI Researchers studying generative video trends and user behavior. - Content Creators seeking inspiration and understanding what resonates. - Marketing Agencies monitoring brand mentions and creative trends. - Data Scientists building datasets for machine learning and analytics. - Media Companies tracking viral content and emerging creators. - Developers building applications around AI-generated content. --- ## 🚀 Business Use Cases - Trend Analysis: Identify viral prompts, themes, and creative patterns. - Content Curation: Aggregate and showcase top Sora creations. - Competitive Intelligence: Track how competitors use AI video generation. - Influencer Discovery: Find trending creators and their engagement rates. - Brand Monitoring: Track mentions and sentiment around your brand. - Research & Development: Build datasets for AI content analysis. - Market Research: Understand user preferences in AI-generated content. --- ## 🛠 Input Schema The actor accepts the following input: Example 1 (query-based search): json { "query": "SpongeBob", "numOfComments": 10, "downloadVideo": false, "downloadThumbnail": false, "downloadGIF": false, "maxItems": 10, "proxyConfiguration": { "useApifyProxy": false } } Example 2 (direct startUrls): json { "startUrls": [ "https://sora.chatgpt.com/p/s_68de52e8be348191b268bdad202ea36a" ], "numOfComments": 5, "downloadVideo": true, "downloadThumbnail": true, "downloadGIF": false, "maxItems": 1, "proxyConfiguration": { "useApifyProxy": true } } ### Input Parameters | Parameter | Type | Required | Description | | -------------------- | ------- | -------- | ------------------------------------------------------------------------------------------------ | | query | string | No | Search query to find Sora posts (e.g., "SpongeBob", "sunset timelapse") | | startUrls | array | No | Detail post URLs in the format https://sora.chatgpt.com/p/s_{{id}}; any other format is ignored | | numOfComments | integer | No | Maximum number of comments to extract per post (default: 10) | | downloadVideo | boolean | No | Download video files to key-value store. Key stored in videoStoreKey (default: false) | | downloadThumbnail | boolean | No | Download thumbnail images to key-value store. Key stored in thumbnailStoreKey (default: false) | | downloadGIF | boolean | No | Download GIF previews to key-value store. Key stored in gifStoreKey (default: false) | | maxItems | integer | No | Maximum number of posts to scrape (default: 10) | | proxyConfiguration | object | No | Apify proxy configuration for requests | Notes: - Required input: At least one of query or startUrls must be provided to run the scraper. - Start URLs: Only detail URLs in the format https://sora.chatgpt.com/p/s_{{id}} are processed; other URLs are ignored. - Media Downloads: Enabling video, thumbnail, or GIF downloads will increase run time but provide direct access to media files in the key-value store. - Comments: Set numOfComments to control how many comments are extracted per post. Comments include full profile data and engagement metrics. - Performance: Higher maxItems and media downloads will consume more resources and time. --- ## 📦 Output Schema Each dataset item contains comprehensive post data: json { "id": "s_68dca5d7d4ac8191987e9c6393d498d4", "text": "spongebob as a ww2 leader speaking about the scourge of fish ruining bikini bottom wearing axis power uniform", "caption": null, "link": "https://sora.chatgpt.com/p/s_68dca5d7d4ac8191987e9c6393d498d4", "coverUrl": "https://videos.openai.com/vg-assets/...", "gifUrl": "https://videos.openai.com/vg-assets/...", "postedAt": 1759290839.830908, "updatedAt": 1759936985.530838, "likes": 1289, "replies": 43, "views": 39477, "uniqueViews": 24713, "remixes": 76, "recursiveReplies": 70, "dislikeCount": 0, "workspaceId": null, "postedToPublic": true, "emoji": "🧽", "attachments": [ { "id": "s_68dca5d7d4ac8191987e9c6393d498d4-attachment-0", "title": "New Video", "url": "https://sdmntprsouthcentralus.oaiusercontent.com/files/...", "downloadableUrl": "https://sdmntprsouthcentralus.oaiusercontent.com/files/...", "thumbnail": "https://videos.openai.com/vg-assets/...", "gif": "https://videos.openai.com/vg-assets/...", "width": 352, "height": 640, "generationId": "gen_01k6eyadhqezmskzd31pp2n2xm", "generationType": "video_gen" } ], "profile": { "id": "user-vmw00GfT7mSYdcIST7bLbwCF", "username": "jakeleventhal", "displayName": "Jake Leventhal", "profilePictureUrl": "https://sdmntprnorthcentralus.oaiusercontent.com/files/...", "coverPhotoUrl": null, "link": "https://sora.chatgpt.com/profile/jakeleventhal", "verified": false, "followerCount": 2664, "followingCount": 7, "postCount": 61, "replyCount": 0, "likesReceivedCount": 22854, "remixCount": 1606, "cameoCount": 33, "isBlocked": false, "followedBy": [], "planType": null, "createdAt": 1753852741.285583, "updatedAt": 1759951105.520806, "bannedAt": null, "calpicoIsEnabled": true, "soraWhoCanMessageMe": "followees_only", "isPublicFigure": false, "location": null, "description": null, "birthday": null, "website": null }, "videoStoreKey": "s_68dca5d7d4ac8191987e9c6393d498d4_video_0.mp4", "thumbnailStoreKey": "s_68dca5d7d4ac8191987e9c6393d498d4_thumbnail_0.webp", "gifStoreKey": "s_68dca5d7d4ac8191987e9c6393d498d4_gif_0.gif", "comments": [ { "id": "68dcb87374948191bc6c9f88b5ea723e", "text": "Ts gonna be the reason Viacom gonna shut this down😭😭", "caption": null, "postedAt": 1759295603.455459, "updatedAt": 1759530609.903473, "likes": 16, "parentPostId": "s_68dca5d7d4ac8191987e9c6393d498d4", "rootPostId": "s_68dca5d7d4ac8191987e9c6393d498d4", "postUrl": "https://sora.chatgpt.com/p/s_68dca5d7d4ac8191987e9c6393d498d4", "profile": { "id": "user-PDq6JrFlZ0qjFVKrdeAmiTnh", "username": "skipppz", "displayName": "C", "profilePictureUrl": "https://cdn.openai.com/sora/images/profile_placeholder_v4.png", "verified": false, "followerCount": 1, "followingCount": 2, "postCount": 9, "replyCount": 9, "likesReceivedCount": 98, "remixCount": 2, "cameoCount": 0 } } ] } ### Output Fields Explained #### Post Data - id: Unique post identifier - text: The prompt/description used to generate the video - caption: Optional caption text - link: Direct link to the post on Sora - coverUrl: URL to the cover image - gifUrl: URL to the animated GIF preview - emoji: Associated emoji for the post #### Engagement Metrics - likes: Number of likes - replies: Direct reply count - views: Total view count - uniqueViews: Unique viewer count - remixes: Number of times the video was remixed - recursiveReplies: Total replies including nested threads - dislikeCount: Number of dislikes #### Timestamps - postedAt: Unix timestamp when post was created - updatedAt: Unix timestamp of last update #### Attachments - id: Attachment identifier - title: Attachment title - url: Direct video URL - downloadableUrl: URL for downloading - thumbnail: Thumbnail image URL - gif: GIF preview URL - width / height: Video dimensions - generationId: Sora generation ID - generationType: Type of generation (e.g., "video_gen") #### Profile Data Complete user profile including: - Username, display name, profile picture - Verification status - Follower/following counts - Post and reply counts - Likes received, remix count, cameo count - Account creation and update timestamps - Privacy settings and location #### Downloaded Media Keys - videoStoreKey: Key-value store key for downloaded video (provided only when downloadVideo is enabled) - thumbnailStoreKey: Key-value store key for downloaded thumbnail (provided only when downloadThumbnail is enabled) - gifStoreKey: Key-value store key for downloaded GIF (provided only when downloadGIF is enabled) #### Comments Array of comment objects with: - Comment text and timestamps - Like counts - Parent and root post IDs - Full profile data for commenter - Post URL for context --- ## 🎯 Advanced Features ### Media Download System When you enable media downloads (downloadVideo, downloadThumbnail, or downloadGIF), files are automatically saved to Apify's key-value store with predictable keys: - Videos: {postId}_video_{index}.mp4 - Thumbnails: {postId}_thumbnail_{index}.webp - GIFs: {postId}_gif_{index}.gif Access downloaded files programmatically or through the Apify console's key-value store tab. ### Comment Threading Comments maintain parent-child relationships through parentPostId and rootPostId fields, allowing you to reconstruct conversation threads. Each comment includes: - Full commenter profile - Engagement metrics (likes) - Timestamps for tracking conversation flow ### Engagement Analytics Track multiple engagement dimensions: - Virality: views and uniqueViews show reach - Interaction: likes, replies, and recursiveReplies measure engagement depth - Creativity: remixes show how content inspires others - Trend tracking: Compare metrics across posts to identify patterns --- ## 🔧 Best Practices 1. Start Small: Test with maxItems: 10 to understand output structure before scaling. 2. Media Downloads: Only enable media downloads when necessary — they significantly increase run time. 3. Comment Limits: Adjust numOfComments based on your needs. High-engagement posts can have hundreds of comments. 4. Proxy Configuration: Use Apify proxies for reliable access and to respect rate limits. --- ## 🌟 Why Choose Our Sora Scraper? ✅ First to Market — The only Sora 2 scraper available ✅ Comprehensive Data — Posts, profiles, comments, engagement metrics ✅ Media Support — Download videos, thumbnails, and GIFs ✅ Production Ready — Structured output, error handling, proxy support ✅ Well Maintained — Regular updates as Sora evolves ✅ Expert Support — Backed by certified Apify Partners --- 👀 p.s. Got feedback or need an extension? Lexis Solutions is a certified Apify Partner. We can help you with custom solutions or data extraction projects. Contact us over Email or LinkedIn ## Support Our Work 💝 If you're happy with our work and scrapers, you're welcome to leave us a company review here and leave a review for the scrapers you're subscribed to. It will take you less than a minute but it will mean a lot to us! Image Credit: https://sora.chatgpt.com/
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Sora Scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- lexis-solutions
- Pricing
- Paid
- Total Runs
- 4,847
- Active Users
- 97
Related Actors
🏯 Tweet Scraper V2 - X / Twitter Scraper
by apidojo
Instagram Scraper
by apify
TikTok Scraper
by clockworks
Instagram Profile Scraper
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support