Article Text Extractor
by mtrunkat
Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation o...
Opens on Apify.com
About Article Text Extractor
Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.
What does this actor do?
Article Text Extractor is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
Simply extracts article text and other meta info from given url. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose. Check out also lukaskrivka/article-extractor-smart. Output get's saved into a default key-value store under the OUTPUT key. HTML of the given page is stored under the page.html key. Example output: json { "title": "Sánchez no logra extender su poder territorial pese al triunfo del 26-M", "softTitle": "Sánchez no logra extender su poder territorial pese al triunfo del 26-M", "date": "16/06/2019 22:03", "author": [ "Madrid" ], "publisher": "La Vanguardia", "copyright": "La Vanguardia Ediciones Todos los derechos reservados", "favicon": "https://www.lavanguardia.com/rsc/images/ico/favicon.ico", "description": "El PSOE ganó el pasado 26 de mayo las elecciones municipales y autonómicas de manera 'clara y rotunda', según celebró el propio Pedro Sánchez aquella misma noche. Aunque la victoria socialista se tiñó...", "lang": "es", "canonicalLink": "https://www.lavanguardia.com/politica/20190617/462906149711/psoe-pedro-sanchez-elecciones-26m-alcaldias-gobiernos-espana.html", "tags": [], "image": "https://www.lavanguardia.com/r/GODO/LV/p6/WebSite/2019/06/17/Recortada/20190614-636961455890161857_20190614215051428-kvhE-U462903686315FDE-992x558@LaVanguardia-Web.jpg", "videos": [], "links": [], "text": "..." }
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Article Text Extractor now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- mtrunkat
- Pricing
- Paid
- Total Runs
- 1,393,988
- Active Users
- 926
Related Actors
Smart Article Extractor
by lukaskrivka
Google Search
by devisty
Twitter Tweets Scraper
by gentle_cloud
Twitter Profile
by danek
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support