API / JSON scraper
by pocesar
Scrape any API / JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. Enables you...
Opens on Apify.com
About API / JSON scraper
Scrape any API / JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. Enables you to follow pagination recursively from the payload without the need to visit the HTML page.
What does this actor do?
API / JSON scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Download and format JSON endpoint data Download any JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. ## Features * Optimized, fast and lightweight * Small memory requirement * Works only with JSON payloads * Easy recursion * Filter and map complex JSON structures * Comes enabled with helper libraries: lodash, moment * Full access to your account resources through Apify variable * The run fails if all requests failed ## Handling errors This scraper is different from cheerio-scraper that you can handle the errors before the handlePageFunction fails. Using the handleError input, you can enqueue extra requests before failing, allowing you to recover or trying a different URL. js { handleError: async ({ addRequest, request, response, error }) => { request.noRetry = error.message.includes('Unexpected') || response.statusCode == 404; addRequest({ url: `${request.url}?retry=true`, }); } } ## Filter Map function This function can filter, map and enqueue requests at the same time. The difference is that the userData from the current request will pass to the next request. js const startUrls = [{ url: "https://example.com", userData: { firstValue: 0, } }]; // assuming the INPUT url above await Apify.call('pocesar/json-downloader', { filterMap: async ({ request, addRequest, data }) => { if (request.userData.isPost) { // userData will be inherited from previous request request.userData.firstValue == 0; // return the data only after the POST request return data; } else { // add the same request, but as a POST addRequest({ url: `${request.url}/?method=post`, method: 'POST', payload: { username: 'username', password: 'password', }, headers: { 'Content-Type': 'application/json', }, userData: { isPost: true } }); // omit return or return a falsy value will ignore the output } }, }) ## Examples ### Flatten an object js { filterMap: async ({ flattenObjectKeys, data }) => { return flattenObjectKeys(data); } } /** * an object like * { * "deep": { * "nested": ["state", "state1"] * } * } * * becomes * { * "deep.nested.0": "state", * "deep.nested.1": "state1" * } */ ### Submit a JSON API with POST jsonc { "startUrls": [ { "url": "https://ow0o5i3qo7-dsn.algolia.net/1/indexes/prod_PUBLIC_STORE/query?x-algolia-agent=Algolia%20for%20JavaScript%20(4.13.0)%3B%20Browser%20(lite)&x-algolia-api-key=0ecccd09f50396a4dbbe5dbfb17f4525&x-algolia-application-id=OW0O5I3QO7", "method": "POST", "payload": "{\"query\":\"instagram\",\"page\":0,\"hitsPerPage\":24,\"restrictSearchableAttributes\":[],\"attributesToHighlight\":[],\"attributesToRetrieve\":[\"title\",\"name\",\"username\",\"userFullName\",\"stats\",\"description\",\"pictureUrl\",\"userPictureUrl\",\"notice\",\"currentPricingInfo\"]}", "headers": { "content-type": "application/x-www-form-urlencoded" } } ] } ### Follow pagination from payload js { filterMap: async ({ addRequest, request, data }) => { if (data.nbPages > 1 && data.page < data.nbPages) { // get the current payload from the input const payload = JSON.parse(request.payload); // change the page number request.payload = { ...payload, page: data.page + 1 }; // add the request for parsing the next page addRequest(request); } return data; } } ### Omit output if condition is met js { filterMap: async ({ addRequest, request, data }) => { if (data.hits.length < 10) { return; } return data; } } ### Unwind array of results, each item from the array in a separate dataset item js { filterMap: async ({ addRequest, request, data }) => { return data.hits; // just return an array from here } }
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try API / JSON scraper now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- pocesar
- Pricing
- Paid
- Total Runs
- 352,451
- Active Users
- 546
Related Actors
Web Scraper
by apify
Cheerio Scraper
by apify
Website Content Crawler
by apify
Legacy PhantomJS Crawler
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support