Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: ChatGPT-i18n – Translate websites' locale json files with AI assistance (github.com/observedobserver)
92 points by basic_banana on March 9, 2023 | hide | past | favorite | 40 comments
I build this app because I was tired of using Google Translate to translate my locale files (i18n). I wanted to use a more efficient and accurate translation tool. ChatGPT, however, always break my json and cannot translate large contents. So I build this app to solve these problems. Hope it can save your time.

github: https://github.com/ObservedObserver/chatgpt-i18n

online app: https://chatgpt-i18n.vercel.app/



I’m always wary of automated translation, because in the general case you need a native speaker who understands the specific application context, in order to end up with a truly fitting and idiomatic translation. However, with ChatGPT there is at least some room for improvement over traditional automated translation, in that you could explain the application context to ChatGPT, which presumably would increase the likelihood of it producing a fitting translation. Of course, a translation tool using that approach would have to be based on an input format that, in addition to the text to be translated, provides a description of the application context in which it occurs. This is something a human translator typically needs as well when they don’t have access to the application itself.


Even human written outsourced translations are pretty unreliable as the translators don't understand the software and end up translating key terms differently across the app while the English version is careful to always refer to the same thing with the same terminology.


You get what you pay for. I’ve worked with great translators — the kind who came up with new terms for subjects not previously discussed in the target languages – but they’re not cheap since you’re talking a human with multiple non-trivial skills.

I suspect that the AI tools are going to continue to split that into low-end mass market where businesses don’t care about quality as much as price and a rarified high-end market where a few people hang on to the skilled work.


To be fair even the greatest translator can't make miracles, if he is not provided with reference/documentation, I am often translating completely in blind, just guessing because client doesn't bother to provide any reference.


Translation is a high-skill job and isn’t payed appropriately anymore nowadays. So getting subpar results when outsourcing to a random translator probably isn’t surprising.


To address the problem of inconsistent terminology, Localization Management Systems (LMS) leverage a term base (glossary), which clearly explains each term and provides a preferred translation. Along with Translation Memory (TM) that allows for partial matches against past translations, a LMS offers functions that help ensure consistency in translations. Sadly, the know-how across engineering teams around localization is rather low, meaning that the existence of term bases, translation memories is not well known. Also, when working with external vendors, companies are not aware of these two artifacts being something that can be requested from the vendor at completion of a translation job and leveraged further.


The problem is that one (ambiguous) term in English can cover more meanings or scenarios while in other language it may be required different words based on situation, for instance "lap" in swimming pool or stadium, heck even very basic "subscribe" can differ whether you are subscribing paid service or some free content in some languages, while in English you can easily use same word for both scenarios.

One would think TM and glossaries are the very basic requirements when requesting translation to update already existing translation, but I agree some companies don't provide these, which can cause inconsistencies when changing translator/company.


I wrote a JS plugin over a decade ago that would scan your page for strings, use google's free translate API to translate it to the target language, and save the translations to a local xx.yml file in the locales directory. It was designed for rails, since I needed to pick something for the backend local file caching, but I could rewrite it for other backends in an hour.

I would browse our sites in french, spanish, and english, and then we just had a fluent expert flip through the generated yml files and make any tweaks they felt were necessary. The translation wasn't perfect, but it was pretty good, and they were table to do a whole site translation in less than an hour and feel confident they got everything.

It died from config rot, but it worked great until google changed some APIs. Ah, I just looked it up - 13 years ago!!


I did something similar. New strings would get Google Translate translations when we merged changes in. Then it would kick off a translation job through a human translation service. We'd have to periodically poll that service to pull translations in. Then once we pushed to QA, our in-house teams could look at the app and tweak the translations.

It worked most of the time, but the lack of context around what the strings were for did result in bad translations.


wow, that sounds great. I love the idea that it can generate locale files direct from a webpage. It's a pity that it is not available now.


When I tried to translate this

  {
    "title":"hello"
  }
I got

  {
    "título": "hola"
  }
Why did it change the key?


Depends on how they engineered the prompt.

Doing it directly with ChatGPT was pretty straight forward.

```

The following is a piece of JSON where the values are strings in the English language.

  {
    
    "title":"hello"
  
  }
Please translate the values into Spanish without altering the keys of the JSON object.

```

Responded with:

```

  {

    "title":"hola"

  }
```

EDIT: Looks like this is the prompt being used:

`Translate a i18n locale json content to ${targetLang}.`

Just my opinion, that prompt needs improvement.


Even one example in the prompt would go a long way.


Just ask for what you want :]


I’m not sure what you mean. I’m saying that adding one example to the prompt given to ChatGPT would improve accuracy. In my experience, examples work better than instructions.

The request and response that I replied to would be a fine starting point. It already gets across keys aren’t translated but values are. I’d add a few more entries to show other things like “match the punctuation/capitalization level” and “use the source text for untranslatable names”.


Have we given up on wrangling data and parsing because of "AI". Why not pull out the bit you want translated, and send just that to GPT?


Or better yet, send it to Google Translate?


I add some limit to the prompt and it is fixed temporally. I was thinking to transform the json struct to a flat struct with no keys before, but worrying about the structure it self can improve the accuracy of translation as context.


> I wanted to use a more efficient and accurate translation tool

ChatGPT more accurate than Google Translate? I find that hard to believe. The obvious solution to make the process more efficient would be to use the Translate API. Seems like something you could script in a couple of hours, tops.


Google Translate is surprisingly bad when you try to do real work with it, especially its community verified corrections which are guaranteed to be worse. In Spanish, community verified translations never have accents.

Chatgpt and DeepL are both better. They understand context much better and more likely use real expressions over literal translations.


We are processing about 50 million words per day for about 12 languages. I can tell from experience that google translate is giving the best results. It similarly prices to deepl. We explored the idea of using chatgpt, but its extremely expensive compared to google/deepl 500.000 words for +-20 dollars versus a lot more with chatgpt.


A translation tool like PoEdit can mass translate a bunch of text strings using a few different services. It's the simplest demonstration of the lower quality of Google Translate vs. DeepL at least for shorter strings. GTranslate rarely knows how a native speaker would say short phrases and concepts, preferring to translate them literally.

I'm surprised you haven't run into problems with community verified translations. They're so predictably poor that I'd stop using GTranslate just to avoid that.


That's first time I hear someone would be praising Gtranslate, while they improved recently and they are catching up with DeepL, DeepL is still by far superior to GT, everything else is in completely different league (i don't have experience with ChstGPT). Btw. Reverso should be using same neural model as DeepL and provides also interesting results, though both DeepL and GT are superior to Reverso.


+1, especially when it comes to localizing UI text. The trick with UI text is that in many languages it's not grammatically correct to begin with (GUI labels are rarely expressed in full sentences), and the terseness introduces vast amounts of ambiguity in meaning.

Like take a "Submit" button - it can mean anything from "send this request in" to "grovel in front of me" depending on context. Human localizers struggle with this, especially when the source files they're provided with do not have sufficient context (and they cannot see the GUI in-context) to infer meaning. Existing translation engines are worse.


> Human localizers struggle with this, especially when the source files they're provided with do not have sufficient context (and they cannot see the GUI in-context) to infer meaning.

And sometimes the lack of context is self-inflicted. My company just started evaluating translation services and as part of that process, we sent some of our strings to the candidate companies. Many of our strings are in Java properties files, which we've annotated with comments to give context.

Yesterday I got a request back from one of them: please convert the strings to a JSON file. Which is on some level the equivalent of, please strip out all the comments you added for our benefit.

Yes, I get that our strings will be fed into translation software. But still, we made the effort to give them context and they told us to get rid of it.


For Vietnamese, ChatGPT has been much more effective for me. You can tell it very specific things to modify as well as give it additional context which you can't really do with Google Translate. Especially with pronouns, google will just translate everyone as 'uncle' or 'aunt' or 'grandma' or 'grandpa' and it'll get genders wrong all the time, which you can correct for with ChatGPT.


Consider:

    Translate the following json localization into French.  Leave json keys untouched.
    ###
      {
        "button name": "Click here",
        "instructions":"press the \"${button name}\" button"
      }
    ###
Which returned:

    {
        "button name": "Cliquez ici",
        "instructions":"appuyez sur le bouton \"${button name}\""
    }
Note that the button name remains unchanged. This is something that Google translate would do poorly.


Native Google Translate may have problems here. However, when integrated into a Localization Management System (LMS) that is aware of common placeholder formats, Google Translate/DeepL are not expected to destroy the original translation text. Even if they do, consistency checks of the LMS is expected to verify that all placeholders from the source copy are present in its translation.


For Japanese, at least, ChatGPT is better than Google Translate because it considers the context of each sentence, while Google Translate translates each sentence independently. DeepL is still the best, though.


Bilingual Large Language Models are much better than the ilk of google translate. https://github.com/ogkalu2/Human-parity-on-machine-translati...


I found that GPT-3 is much better than DeepL when given high-quality examples as a prompt. GPT-3 knows more about the world, so it can translate proper nouns and slang used in specific communities better than DeepL.


While it's a pretty cool use of ChatGPT, there are lots of options for translating json locale files. I recently wrote a post about doing it in React Native projects, but the same concept could be applied to any framework, really. The npm package "json-autotranslate" can be installed as a CLI tool, and can hook into a fair few different translation services. I generally find using the DeepL-free provider works extremely well. https://dev.to/mikehamilton00/react-native-expo-automatic-ma...


Missed the opportunity on having your webpage translated.


Sometimes back I looked into and compared Google translate and chat GPT for this. Seems like Google translate is superior for the translating English into languages that I knew (Spanish and Hindi).

I ended up writing a small GitHub Action to automate this for hobby apps. https://github.com/ashishb/android-auto-translate


Hey, we just working in ChatGPT or other openAI models integration to Tolgee localization platform. Cool thing about this is that we have lot of context about the strings, since our integrations (including in-context localization). That way we can provide super accurate results. Stay tuned. https://tolgee.io


Cool idea. I've used Github Copilot to do a lot of translations but this looks much easier.

It's a pity Vercel has a cli login wall to even run the app. I tried the Github option and it didn't work. My first impression of Vercel is not good. Unfortunately the demo app threw some errors in the console on file I tested with.


Sorry about the bad experience, can you share what errors you ran into? Feel free to email: rauchg@vercel.com

We’ll try to debug by running the demo ourselves in the mean time!


Interesting.

I've used json-translator in the past.

It supports Google Translate, Bing Microsoft Translate, Libre Translate, and Argos Translate.

https://github.com/mololab/json-translator


The official sponsor of i18next actually offer json translation for free here:

https://translate.i18next.com


This is not a self promotion post, I just want my experience working on i18n better. It is welcome to share your favorite tools that you like!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: