API / JSON scraper

by pocesar

Scrape any API / JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. Enables you...

352,451 runs

546 users

Try This Actor

Opens on Apify.com

About API / JSON scraper

Scrape any API / JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. Enables you to follow pagination recursively from the payload without the need to visit the HTML page.

What does this actor do?

API / JSON scraper is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Download and format JSON endpoint data Download any JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. ## Features * Optimized, fast and lightweight * Small memory requirement * Works only with JSON payloads * Easy recursion * Filter and map complex JSON structures * Comes enabled with helper libraries: lodash, moment * Full access to your account resources through `Apify` variable * The run fails if all requests failed ## Handling errors This scraper is different from cheerio-scraper that you can handle the errors before the `handlePageFunction` fails. Using the `handleError` input, you can enqueue extra requests before failing, allowing you to recover or trying a different URL. js { handleError: async ({ addRequest, request, response, error }) => { request.noRetry = error.message.includes('Unexpected') || response.statusCode == 404; addRequest({ url: `${request.url}?retry=true`, }); } } ## Filter Map function This function can filter, map and enqueue requests at the same time. The difference is that the userData from the current request will pass to the next request. js const startUrls = [{ url: "https://example.com", userData: { firstValue: 0, } }]; // assuming the INPUT url above await Apify.call('pocesar/json-downloader', { filterMap: async ({ request, addRequest, data }) => { if (request.userData.isPost) { // userData will be inherited from previous request request.userData.firstValue == 0; // return the data only after the POST request return data; } else { // add the same request, but as a POST addRequest({ url: `${request.url}/?method=post`, method: 'POST', payload: { username: 'username', password: 'password', }, headers: { 'Content-Type': 'application/json', }, userData: { isPost: true } }); // omit return or return a falsy value will ignore the output } }, }) ## Examples ### Flatten an object `js { filterMap: async ({ flattenObjectKeys, data }) => { return flattenObjectKeys(data); } } /** * an object like * { * "deep": { * "nested": ["state", "state1"] * } * } * * becomes * { * "deep.nested.0": "state", * "deep.nested.1": "state1" * } */` ### Submit a JSON API with POST jsonc { "startUrls": [ { "url": "https://ow0o5i3qo7-dsn.algolia.net/1/indexes/prod_PUBLIC_STORE/query?x-algolia-agent=Algolia%20for%20JavaScript%20(4.13.0)%3B%20Browser%20(lite)&x-algolia-api-key=0ecccd09f50396a4dbbe5dbfb17f4525&x-algolia-application-id=OW0O5I3QO7", "method": "POST", "payload": "{\"query\":\"instagram\",\"page\":0,\"hitsPerPage\":24,\"restrictSearchableAttributes\":[],\"attributesToHighlight\":[],\"attributesToRetrieve\":[\"title\",\"name\",\"username\",\"userFullName\",\"stats\",\"description\",\"pictureUrl\",\"userPictureUrl\",\"notice\",\"currentPricingInfo\"]}", "headers": { "content-type": "application/x-www-form-urlencoded" } } ] } ### Follow pagination from payload `js { filterMap: async ({ addRequest, request, data }) => { if (data.nbPages > 1 && data.page < data.nbPages) { // get the current payload from the input const payload = JSON.parse(request.payload); // change the page number request.payload = { ...payload, page: data.page + 1 }; // add the request for parsing the next page addRequest(request); } return data; } }` ### Omit output if condition is met `js { filterMap: async ({ addRequest, request, data }) => { if (data.hits.length < 10) { return; } return data; } }` ### Unwind array of results, each item from the array in a separate dataset item `js { filterMap: async ({ addRequest, request, data }) => { return data.hits; // just return an array from here } }`

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try API / JSON scraper now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: pocesar
Pricing: Paid
Total Runs: 352,451
Active Users: 546

Related Actors

Web Scraper

by apify

Cheerio Scraper

by apify

Website Content Crawler

by apify

Legacy PhantomJS Crawler

by apify

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

API / JSON scraper

About API / JSON scraper

What does this actor do?

Key Features

How to Use

Documentation

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?