Yet Another Dataset Translator

by mvolfik

Actor to translate datasets with field selection and source language detection. Requires Google Translate API Key.

80 runs
4 users
Try This Actor

Opens on Apify.com

About Yet Another Dataset Translator

Actor to translate datasets with field selection and source language detection. Requires Google Translate API Key.

What does this actor do?

Yet Another Dataset Translator is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

Yet another dataset translator Actor to translate datasets with field selection and source language detection. Requires Google Translate API Key. ## Features ### Language detection For each dataset item, this actor performs language detection. If it detects that the item is already in the target language, it skips translation of that item, thus saving your Google Translate budget. ### Mock run You can run this actor with empty API key to mock translate items. That way you can test your setup without spending money. Additionally, the actor prints statistics including price estimate, to allow you to predict your Google Cloud spending. Note: the estimate is provided solely for your convenience, without any guarantees of accuracy or correctness. Always check Google Translate API pricing and perform your own estimation. ## Input ### dataset_ids List of IDs of datasets to translate. This allows you to combine items from multiple Actor runs if needed. ### api_key Google Translate API key. This field is stored securely encrypted on Apify servers. If you don't provide a key, the actor will run in "mock translation" mode, only prefixing each string with "TRANSLATED " instead of calling to Google servers. ### field_patterns_to_translate Provide a list of globs to identify fields that should be translated. Supported wildcards: - *: any number of any characters: *Field matches Field, someField, 1Field but not field - ?: a single character: ?ield matches yield, Field, ield but not ield or aField - [chars]: a single occurence of any character in chars: [fF]ield matches field and Field, but not any of ffield, yield, ield - [!chars]: a single occurence of any character not in chars: [!y ]ield matches field, Yield, but not ield, yield, ield (The globs can of course appear at any position in the pattern, and you can combine them in any way. Use a single glob * to translate all fields.) ### detect_language_threshold Language detection threshold. Default value of 0.7 is suitable for most use-cases, but if you need to be 100% sure that all output text is in given language, you can increase it to a value like 0.95. If you provide 0, language detection won't be performed at all and all fields matched by patterns will be sent for translation. The detection is performed at the level of items, on the first 500 characters of concatenation of fields that are to be translated. That means that from given item either all matched, or no fields are translated. ### output_dataset_id ID of output dataset, if you need to aggregate items from multiple runs into one dataset. If not provided, Actor will use its own default dataset. ### translation_marker_field Default value = wasTranslated. Each output item will contain this field, specifying if the item was translated (→ true) or not (→ false). If you set this field to empty string (or null), the field will not exist. ### original_value_field_prefix Default value = original_. Each translated item item will also contain a copy of each translated field, prefixed with this value, that will contain the original, untranslated string. For example, for input item json { "text": "Auf Wiedersehen." } The output would be json { "text": "Goodbye.", "original_text": "Auf Wiedersehen." } If you set original_value_field_prefix to empty string (or null), the original values will not be provided in output. --- Disclaimer: This Actor serves as a tool that interfaces with the Google Translate API and does not hold any responsibility for the quality of translations provided by this third-party service. By supplying an API key, the user consents to this Actor accessing the Google Translate API on their behalf. Users are responsible for ensuring that the amount of text submitted for translation is within their allocated quota and adheres to the Google Translate API's terms of service.

Categories

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Yet Another Dataset Translator now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
mvolfik
Pricing
Paid
Total Runs
80
Active Users
4
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support