How Google Translate broke my translate-json userscript

How Google Translate broke my translate-json userscript

Hello good people! Today I’m going to tell you about a tool I made 2 years ago called translate-json. By injecting some JavaScript into DeepL or Google Translate page, it lets you translate JSON files.

I’ve made an unfortunate discovery so I’ll have to break this article into two main section. I’ll talk first about this simple tool. Next, I’ll tell you how Google Translate broke my translate-json tool.

How translate-json works ⚙️

Let’s start by introducing the star of this idea; GreaseMonkey (for Firefox) and TamperMonkey (for Google Chrome). These powerful tools let you run “user scripts”. These scripts allow you to customize websites you visit, alter their behavior or even make some actions automatic. If you want to read more about Greasemonkey, check its dedicated Wikipedia page.

⚠️ Please note that user scripts can be harmful, you should not trust any source and install any user script you find.

When you install my script and visit the Google Translate/DeepL page, you were supposed to see two text areas. The first is the “Input JSON” where you’re supposed to copy-paste your JSON file and the output textarea for the translated JSON. There’s also a button to start the translation as well as a progress bar based on your JSON translation progress. Below a screenshot of the tool in action with DeepL.

translate-json working on DeepL

You will also notice a Delay interval, this pre-defined variable defines how much time to get the current translation and push the next translation in the list. I admit it’s an arbitrary value so you can for example reduce it to 500ms. But depending on your configuration and especially your network speed, this can become handy.

Now that I’ve covered the big lines, let’s deep dive into the code.

How the code works 🤔

The code is fairly simple, let’s break it down into steps:

  • Detect the current website and set up the constants.
  • Parse and flatten the input JSON to easily process it.
  • Loop on the JSON to translate it.
  • Unflatten and show the output translated JSON.

Setting up the constants and detect the website

I’ve defined two constants GOOGL and DPL as follows:

const GOOGL = {
    site: `googl`,
    content: `.homepage-content-wrap`,
    input: `.tlid-source-text-input`,
    output: `.result-shield-container`
};

const DPL = {
    site: `dpl`,
    content: `#dl_translator`,
    input: `[dl-test="translator-source-input"]`,
    output: `[dl-test="translator-target-input"]`
};

Both constants contain the selectors the script will use to:

  • Inject the HTML of the textareas, the button, progress bar, and others.
  • The input field for the translations.
  • The output field for the translated word/phrase.

Next, you have a small function that just detects on what site the script is being run at. One neat thing is that Greasemonkey and TamperMonkey restrict your script to only the domains you define in the header part. so it’s just a matter of knowing if I’m working on google.com or deepl:

let currentSite;

function detectSite() {
    if (window.location.href.indexOf(`google.com`) > -1) {
        currentSite = GOOGL;
    } else {
        currentSite = DPL;
    }
}
detectSite();

Next I inject the HTML code using createJSONInputOutputContainer(); simple stuff. Let’s now look at the meat of the script startTranslation(). First, it will get our HTML elements like the progress bar, translate button, and interval input. It will then safely access the interval input value and if it’s a valid Integer, it will use it instead of the pre-defined 2000ms delay.

The inputInField(value) function sets a value to the translator input text area, it will then fire up an input Event that will trigger the translator translation routine.

The getOutputField() will simply get the translated value from the correct HTML element. One subtle difference, DeepL uses a simple textarea while Google uses a div.

Flatten the JSON, translate and output it

Now the most interesting function extractJSON(obj, indent) is a useful piece of code I found that helps "flattening" JSON files. Basically, a JSON file is like a tree, so it’s difficult to explore.

This function will flatten the JSON file so it goes from :

// Input JSON
{
  "APP": {
    "general": {
      "ui": {
        "close": "Close app"
      }
    }
  }
}
// Output JSON
[
  { 
    "key": "APP.general.ui.close",
    "value": "Close app"
  },
];

Having a simple array with a key/value couple makes it easy to translate all of this in another array then put everything back as a JSON respecting the same structure!

The translation is then done in a simple try {} catch() {} to avoid blocking errors. For each translated word, the script waits for the delay that has been set using a simple setTimeout and updated the progress bar. If any error is detected, the catch block fires up and we display an error just so that the user knows something went wrong.

Finally, the result is thrown back to outputToJson(flatJSON) which will put the JSON file back to its input structure.

The last neat function is addCSS a helper function that I wrote to append any CSS in the head of the document.

Drawbacks 🚨

This script is fairly simple, the main issue I have is that if you have an i18n.json file with the following structure:

{
  "APP.general.ui.close": "Close the application"
}

You’re going to end up with something like this:

{
 "APP": {
  "general": {
   "ui": {
    "close": "Close the application"
   }
  }
 }
}

i18n plugins have no issues with both structures, but it can become difficult to maintain for the user.

One useful improvement I was thinking about was to ask the user what kind of structure he would like to have:

  • Keep the current structure.
  • Flatten and optimize the structure.
  • Expand and make the structure more tree-like.

But it would make the script more complex. I didn’t mention this but it took me just a day to put all this up. My main motivation was just to help me translate i18n JSON files quickly. It did the job so I didn’t invest more time in it.

How Google broke the script 💔

Let’s get back to our main subject, how Google Translate broke my translate-json script. If you try to check out Google Translate while the script is being injected with Greasemonkey or Tampermonkey, you’ll notice a mysterious error. Inspecting the code and the page’s DOM made me realize that everything had ransom CSS classes. I couldn’t find much info about this but here are my findings:

  • The classes are the same after a refresh with makes me believe that they are randomized upon each build.
  • Google uses the closure compiler and closure tools, some issues are talking about this concept:

Google Translate setting random classes for its DOM elements

Now, I understand that Google needs to protect its products from misuse and others. Though the tool I made might not be very popular, my guess is that there are maybe other people abusing this system which I don’t encourage.

This said, they let people translate documents, so why not JSON files? Besides, this will surely slow down some people (and break my script) but this does not mean it will stop people from adapting their code to work around this issue.

I won’t be surprised if DeepL does the same thing to protect their product. But I really hope they will add a “translate json file” option instead.

Wrap-up 📦

My translate-json script surely won’t solve all your programming problems. But I really hope you’ve learned something new with this long article. I will try to work around the challenge Google Translate has set up in face of me (provided I find some free time).

As usual, if I made some mistake or forgot to mention something, you know the drill.

Cheers!

Image credits by Thought Catalog on Unsplash.

Did you find this article valuable?

Support TAREK'S /DEV/LOG by becoming a sponsor. Any amount is appreciated!