My dad is a native English speaker living in Italy. When he uses Google maps navigation while driving, the robo-voice absolutely butchers the pronunciation of street names, towns, everything. You'd think the software would be smart enough to use the local language instead of the phone's locale for such things. More than half of the world's population speaks more than one language - we need to get better at building software that accounts for that.
That's a tough one, because you really need both options available.
"Mispronouncing" to your phone's language is actually helpful if you don't speak the language, because you'll legitimately understand it better. Local pronunciation can often be completely and maddeningly unrecognizable to a foreigner.
While if you speak the language, it's obviously nonsensical to have it mispronounced and you want the real thing of course.
Example: "Rio Tinto" in Portuguese is pronounced "HEE-oo CHEEN-too". If you don't speak the language, you're NEVER gonna map those sounds to a street sign because the "r" and "t" sounds you're expecting aren't there at all! While hearing a completely incorrect "REE-oh TIN-toh", you can probably recognize.
This is definitely true. I live in Vietnam and the same problem exists here. If you used true "local pronunciation", most foreigners wouldn't be able to understand many of the street names.
The disconnect between the two pronunciations is so stark that many times taxi drivers have literally no idea what street a foreigner is telling them to go to.
I was going to say that, since Rio Tinto is a mining company, that it probably originates in Brazil. But actually it is an Anglo-Australian company named after a mine in Spain. (Which, according to Wikipedia, started operations 5000 years ago).
- Belgium has three official languages (French: fr-be, Flemish: nl-be and German: de-be).
- Belgium is divided in provinces.
- Provinces in Belgium are either French (thus mainly speak french) or Flemish (thus mainly speak flemish).
- … with the exception of some areas near the german border which mainly speak german. But not the whole province.
- Brussels is not part of a province. It's part of a region called "Brussels Capital". It is French speaking, although it's in the middle of a Flemish province. This is what that looks like: https://twitter.com/Adys/status/1175063489653727233
As you can imagine, most software doesn't account for any of that shit, yet tries to be clever and eg. have websites ignore accept-language.
I recently updated my Nintendo account to be in Belgium. My nintendo account is now half-translated in dutch (NOT flemish). Half because their dutch translation is vastly incomplete. I have no apparent way to change this back without lying about where I live.
> Half because their dutch translation is vastly incomplete.
By this you mean it shows a mixture of en and nl strings?
I have often found it super jarring that people are willing to ship such a thing. Is it better or worse to not attempt nl at all for such a circumstance? And then, OK, maybe Europeans are comfortable with English, but what about when this situation arises for markets where nobody understands English? You can go down a rabbit hole of having different fallbacks for different markets, and sometimes maybe that will work, but it's likely to be just as jarring.
Your Belgium case even suggests maybe the fallbacks should be a per-user setting. User comes in with a list of what they are OK being presented with...
As a European who is comfortable with English, I find myself setting apps' language to English, exactly for this. My language (any language the author doesn't speak, I suspect) tends to be half-translated at best, or worse, "translated" into a hodgepodge of Slavic languages (Czech/Polish/Slovak/Croatian/Russian, whoever was at hand I assume), or worst, copypasted, word by word, through GTranslate, leading to bizarre results: e.g. "exit" in English has an "app" context, a "public transit" context, a "highway" context, and several others; most apps just pick one at random: "Press Back twice to exit the app" becomes "Push backbone twice to walk out of the app", and "Take Exit 24" turns into "Pick up the end 24"
Yes, it's showing me half-english half-dutch, for a translation supposed to be in flemish, on an IP that's supposed to default to french, with a web browser asking for english, in a country that speaks french or flemish.
But the worst part of it is it's doing so with NO WAY TO CHANGE THE LANGUAGE.
> User comes in with a list of what they are OK being presented with...
That is exactly what the Accept-Language header is: A list of preferred languages, detailing which ones work better and which ones to use as fallback, etc.
Youtube surely knows I have watched thousands of English language videos without subtitles. Yet they insist on sometimes giving me horribly translated video titles, seemingly at random. I don't know why and I don't know how to make it stop.
Good god if someone can tell me a way to turn this off I'd be ecstatic, Even having well translated titles is infuriating: The content is english, I will watch it in english, The whole website is in english, but yet somehow I'm required to see the title in a different language?
>> User comes in with a list of what they are OK being presented with...
> That is exactly what the Accept-Language header is: A list of preferred languages, detailing which ones work better and which ones to use as fallback, etc.
I knew about Accept-Language, but I just looked into it and I didn't realize it also had possibility for weighting of each language. I was thinking of an instance where you are highly proficient at languages X and Y, and a website has "native" content in X, and translated content in Y -- you don't want to be served Y in that case. Seems like a tricky case to get right and maybe some server code parsing Accept-Language could choke on it if not careful.
> [Brussels] is French speaking, although it's in the middle of a Flemish province.
This is what I keep reading, but when I visited there I observed both languages in apparently equal measure. Of course my observations were superficial -- I was a tourist who spoke neither language.
But I even if I am right, that just underscores your point. The fact that a request comes from Brussels, gives you very little clue what language the user wants. Especially if that request comes from a some foreigner working for the EU.
Google can probably provide this kind of map with much higher resolution thanks to android. I'd love to see a zoomable world map with stats of what language people set on their devices. Even info about how many percent is set to English would be interesting.
It's even worse in Greece, it can't pronounce the alphabet so spells each letter individually. Turn left at "epsilon alpha delta kappa kappa epsilon x 5"
What are you going for here? e, a, and k are all English rather than Greek letters. It's just a transcription for the benefit of people who aren't comfortable with Greek letter names.
Hell, you don’t even have to leave the US for this to happen. I live in New Orleans, where the street names are a mish-mash of English and French, with a sprinkling of Greek deities for flavor. While a lot of these names are mispronounced in common usage, Maps’ mispronunciations of these names are, as a rule, different mispronunciations than the ones that would come out of the mouth of a local.
It’s both hilarious and saddening, as I suspect a chance that the robo-mispronunciations might end up edging out the local ones over time.
This could be fixed by adding a field to every street name for “a series of phonemes that provide a close approximation of local pronunciation” and using that. But I doubt it ever will.
> This could be fixed by adding a field to every street name for "a series of phonemes that provide a close approximation of local pronunciation"
Interestingly, this is essentially how a lot of Japanese forms (at least that I've seen) work. You have a name field, containing, say, 小島秀夫, then a field for the phonetic reading containing こじまひでお. Both are Kojima Hideo (typically Hideo Kojima in English), but the second is the phonetic (kana) form, whereas the first form of some names might be misread or have an unusual reading. You might also see it as furigana, where the phonetic/kana reading is written above the kanji form in small letters.
And in the case of Japanese, the extra field seems to be utterly essential, because the same characters in Kanji might be pronounced in radically different ways.
There are four Japanese women whose names you have to sort: Junko, Atsuko, Kiyoko, and Akiko.
This does not seem difficult, until they each show you how they write their names in kanji:
淳子 (Junko)
淳子 (Atsuko)
淳子 (Kiyoko)
淳子 (Akiko)
It's pretty comical in Maine as well, where there are a plethora of Native American place names, as well as a whole host of towns with well-known European city names, but wildly different pronunciations.
Saskatchewan too. We have a mixture of First Nations-derived names and major streets named after people. Some of them are hard to grok where they went wrong; a prominent example is "Lewvan Drive" (a major thoroughfare here). It's pronounced "Loo-van", but Google somehow spits out as "Lew-chin". Regina Ave (Reg-eye-na) comes out as Reg-ee-na, which, ok... but that's the name of the city!
> Regina Ave (Reg-eye-na) comes out as Reg-ee-na, which, ok... but that's the name of the city!
Is it, though? I mean, if a city was named, centuries ago, by people who pronounced that name a particular way—and then language shifted in some way and every modern person pronounces the name differently... are the living people right? Or would it be "more correct" to pronounce the name the way that the people who came up with the name pronounced it?
(The Duchess of Argyll would certainly have pronounced the city's name as "Reg-ee-na", given that she named the city after the Latin word for "queen" [to refer to her mother, Victoria], and the Latin word is pronounced that way.)
It's a bit like asking whether the correct name for Saskatchewan itself is "Saskatchewan", or "Kisiskāciwani."
It's also a bit like asking whether the people of St. Louis are wrong to be pronouncing their city's name with a vocalized S.
Waze (by default) is even worse - as you cross the borders it switches to the local language, _including the directions_... so as you enter Italy it expects you to suddenly switch your brain to understand the directions in Italian. I come to think nobody at Waze ever thought about this use case...
In some Muslim country Google forces you to use the Hijri lunar calendar. There is no option to switch to Gregorian. Google knows best and everyone in that country wants Hijri.
I could understand making that an option though, as presumably the distances on signs will be in km. But as usual with this sort of thing, it's developers assuming they know best rather than suggesting and asking.
As a Pakistani living in Pakistan, I second this. Navigation butchers road names. What's worse is that I have to butcher the names myself, just so that Google Assistant would understand my voice, and start navigation when I'm driving.
Blackberry Maps (Blackberry OS 10's navigation / maps app) did exactly this. It pronounced street names and points of interest in the local language of wherever you where navigating. Perfectly. It was released in 2013, but I used the 2015ish version.
It was absolutely mind blowing to me that Google Maps and others don't do this when I switched to Android. Even more mind blowing that they still chose to not implement this to this day.
In non-English speaking countries like Europe, this means you have to literally change the language settings every time you cross a border. Borders are all over the place in Europe. It feels like 1998 tech to have to do this.
And for those claiming that the "Englishified version" of a street name in a foreign language is easier to understand: no. It isn't for anybody who speaks more than 1 language (AKA: most people outside of English speaking and large and/or totalitarian countries).
We live in the US and our Toyota Highlander's navigation basically loses it's mind when we travel into Quebec. When we're in a city, we like to have voice navigation on so we're not looking at the screen the whole time, but it will just go "In 300 feet... turn on... (large volume increase) GAGAGAGAGAGAGAGAGAGA". It was disturbing the first time, but hilarious the next few until we just turned it off.
Google maps butchers a lot of place names in the UK even when it's set to British English. In fareness, that's because many are simply not spelt phonetically.
Driving around Wales or Cornwall are particularly fun, it really can't deal with the names. Not sure if it manages better when it's in Welsh language mode (if that's available).
Likewise, places in the US that take their names from English get butchered (Gloucester and Worcester in Massachusetts are examples). "Concord" and "Berlin" are fun because they are pronounced differently depending on which one you're talking about. Native American names like Scituate are also tough for most GPS speech synthesizers.
I was once in Cyprus using Google Maps and it decided it couldn't cope with the alphabet, so it pronounced each individual letter on each road and placename. It would still be reading the road name and you'd be on the next one.
Does it do the same for people visiting the US? I can imagine tons of butchering especially with placenames which have Native American derivation. Arkansas, Puyallup, Potowomut, or even idiosyncratic places like Peabody \’pee’bdee\
Living in SoCal, as far as I can tell, google maps generates some random numbers and then uses the result to decide whether it should use the Spanish or English pronunciation of roads with Spanish names.
In some cases it uses different pronunciations for two words in the same road name (I've heard "Calle Real" pronounced with "Calle" correct but "Real" as the English word "real"; other times it gets calle wrong and other times it gets them both right).
It does, I moved to New Jersey from France, and my phone was in French. Navigation was kind of funny because it was attempting to pronounce English words as if they were French. For instance, street was pronounced "stré" ("streh", if you are not familiar with é).
Google maps was struggling with Roman numerals in France: many streets and avenues are named after kings, say "Louis XVI", "Henry VI" etc. Back in the days it wasn't able to pronounce those, VI would have sounded like "vee". It's probably been fixed since, that was almost 8 years back.
There is a city in Texas named Amarillo. What is the "correct" pronunciation of that city name - the Spanish way of pronouncing the word, or the way the locals say it?
I believe that one had [ʃ] (English 'sh') when the Spanish got there but then there was a change in Spain (in Castilian only even, not in Catalan or Galician) from /ʃ/ and /ʒ/ -> /x/ (like the 'ch' in loch) and they brought it to the Americas.
Somehow those Spaniards brought that change to /x/ to all those Nahuatl place names but they never brought /θ/ to everyday speech in Mexico. I scratch my head at that.
A friend and former flatmate of mine split his childhood between the US and Mexico; he has native accents in both languages.
It's wild to hear him phoning home, rattling off Spanish words faster than I can discern them, and then use a local pronunciation (and American accent) for Spanish-origin names like Divisadero or Arguello.
> More than half of the world's population speaks more than one language - we need to get better at building software that accounts for that.
Yes please, can't I just setup a cookie or something that lets sites know which languages I speak? I'm so sick and tired of american sites forcing spanish on me because I log from a spanish IP...
I have a Renault in France with navigation, but set it to use the English (British only unfortunately but I'm getting used to it) voice. It can still pronounce the streets correctly, which after living here a while is exactly what I want. So it doesn't seem too hard but lack of prioritization.
A few days ago, I encountered Siri pronouncing A1 (/ei/ one[1]), a major road in the UK, as "ah one" (how you might pronounce a lowercase a). Threw me for a second as I've not heard it make that mistake before.
On my android phone I've got two languages set up and it seems to be pretty good about autocorrect in the right language.
I'm rather curious what the difference is between us that it doesn't work automatically for you. Even google assistant understands me when I speak either language.
It's quite a common scenario for Indic-language users. I'm mainly typing in English and, depending on context and the audience, sometimes in two other Indic languages using the Latin alphabet.
There are a lot of reasons for this, and I don't think I'm alone here:
* Like many Indic language users I speak and understand multiple languages and I'd like to contextually reflect this in my IMs
* Relative lack of fluency/comfort reading Indic script (almost each Indic language has a different abiguda)
* A pronounced lack of experience typing using the multiple abigudas, all of which are more complex to type than the Latin alphabet
I'd really like to be able to type Indic words using Latin script without fighting with autocorrect -- or turning it off altogether.
I’ve mentioned the idea of using the Phonetic alphabet for proper names to people around me. I think having an alphabet where human beings can communicate sounds in an exact form irrespective of any underlying associated meaning is valuable.
I’m not familiar with the Phonetic alphabet or IPA beyond what they are conceptually. Ideally, what I want is a bijective function between sounds and letters in some alphabet.
That still might run into issues due to people not being familiar with sounds that aren't used in their native language. While this could theoretically be fixed by teaching all sounds to children early in school, my pessimistic guess is that at least in the US, large portions of the population would resist in the same way that they resist proposed changes like using metric units.
I’m not sure this would work. If you’re an American named Peter, should your British friends call you “Pete-uhh” or “Pete-ur”? I guess you could just decide that it should be the latter, but that’s a fairly big change from what’s done now.
You could pick the letters that correspond to the way you want it to be sound since it is your name. I suppose there are people who may prefer the difference between the spelling and naïve pronunciation. Maybe an alphabet altogether separate from that used by the IPA would help?
Try pasting some English into Japanese Google Translate and clicking the listen button on the Japanese side. Or indeed for Swedish, or some other language that comes into frequent contact with English…
I'm conditioned to do this trick in reverse, say to look for a recipe or song in French. Speaking the title in a horrible American English accent gives at least a 20% chance that Alexa would understand the query, as opposed to 0%. I'm not familiar with the problem domain at all, but it seems speech recognition is just as bad as TTS or worse at recognizing what language is being processed. Or maybe the problem for both directions is that there is simply no effort to even classify the language first.
I think the problem is that you have only used mass-grade TTS. Google can't deploy the latest and greatest for the billions because it would be 10x more expensive for them, while a small local TTS doesn't have the power to cover every situation. In demo situations it would be a whole different thing. I have followed the papers on TTS and listened to the demos.
For example my MacOS system voice Alex often confuses 'live' (to live somewhere) with 'live' (as in live concert). That's because they have a simple model that can run on a laptop and not use too much space. This sample phrase tested on Google cloud TTS is perfect.
I have the same problem with Alexa. I was trying to get it to play Keith Jarrrets "Köln Concert" album. I doesn't matter how I pronounce "Köln" I cannot get it to understand what I want. I ended up having to create a playlist for it with a different name.
Finally my obscure experience may be relevant! I'm an American who lived in Geneva (French-speaking side of Switzerland) for a while. One night I was watching a talk show on France 2 (TV channel): https://en.wikipedia.org/wiki/Tout_le_monde_en_parle_(French...
After the Iraq invasion they had an American diplomat on, not sure which one. He spoke French fluently, but with a no-fucks-given American accent. But the thing is, I was able to understand him so well! After that, I tried speaking French (in private) without regard to accent, and found I also could do it much more fluently than before!
Set the language to translate from French to English then type English in the French box and it'll say the English words with a french accent if you click the speak button.
It's neat to hear all of the different accents and they sound reasonably accurate.
I wish I learned about Raymond Chen's blog a long time ago. Finding about his work and experiences via his blog has been a joy ever since I read the Floppy Disk story that was posted here last week.
Which is the first IoT assistant thing, first released in 2006. Its a french system, (which had very good localisations though). What I loved the best was getting it to talk in a french accent.
So I set the TTS to french, but sent english. It would say the time like:
"Eet Iz Deece Hure" (from it is 10 hours)
Which is way more loveable than in american english (back then TTS in british was rare. )
A few years back I discovered the Siri interface would not only talk to you in a different accent of your choosing, but it would also understand you differently based the chosen language. I did a quick video based on American English vs Australian English: https://vimeo.com/207295417 (you can probably skip the first 10 seconds or so - it's just me demoing the choice of language).
It's a very benign/harmless post, but kind of weird that is posted on an official Microsoft blog (even though it is a personal blog, it's also an official blog from Microsoft by a Microsoft Employee).
Big companies are so paranoid about everything that I guarantee someone in their legal team is frowning upon this.
I didn’t say that’s weird. What is weird, is to give an opinion or thought on another’s company product (a competitor in several arenas) in a blog that lives under the Microsoft domain and has all the Microsoft marketing and branding.
Again, this post is harmless but since the legal departments job is basically to freak out about everything that puts the company at risk (even a tiny bit), then I’m assuming that mentioning other companies’ products is at the very least frowned upon.
It's not Microsoft's devblog. It's Raymond Chen's blog and because of (many) reasons he's pretty much allowed to write about lots of non Microsoft things.
As somebody who has followed his blog for over a decade now, it's kind of weird seeing it with the Microsoft corporate styling. It used to be pretty idiosyncratic.
Last time I went diving through the archives, a lot of the old links and images either had rotted or gotten mangled in one of the many shifts around. It's a bit sad.
I agree with you on all accounts. Perhaps I was just used to it but the old blog seemed more human somehow. This new styling is cold and looks more like a corporate announcement. I was very sad when I heard about the msdn blog migration, especially because all the old links stopped working. Even the announcement stopped working https://blogs.msdn.microsoft.com/iotdev/2018/10/22/blog-migr...
If it was clearly about an awesome feature of Alexa, maybe that would be weird.
If it was picking on a significant bug in Alexa, then it wouldn't be very nice.
As it is, it seems to be simply finding something amusing for readers to think about, especially if they are developers and see it as an interesting corner case.
While living in France I mucked around with the voice of Siri on the iPhone a bit, to find a setting that would allow the phone to understand both me (speaking English) and the names of streets and places in Paris.
After switching between French and English a couple times I ended up with a voice that spoke English but used the French voice synthesizer.
To me it sounded eerily like a French person with a very thick accent speaking English, but it was still quite understandable. Sadly the setting didn't survive moving to a new phone, it was a fun party trick.
This reminds me of one of my favorite books of all time, Mots D'Heures: Gousses, Rames, which plays with language and bad French pronunciation. Highly, highly recommend.
I would love to have a speech synthesizer to play with where you could programmatically replace all of one sound with another. Like replacing all 'r' sounds with 'b' sounds, or swapping a few IPA vowels, but only when they follow a certain other sound. You could invent and test out novel accents.
I don't have specific insight into the machine learning aspects of Alexa, but based on my experience developing Alexa applications, I believe that the individual voices are trained on specific ML models. Most voices are unilingual, but there are bilingual voices as well, for example Hindi / English for India.
This is the same for Google's & Bing's text translation service. Go to translate.bing.com, type English into one of the boxes and select French (or anything else) as the language -- then hit the 'speaker' icon to have the service pronounce what you typed. Hilarity may ensue.
Seeing all these examples of mispronouciations in these comments, I realize just how uncultured I am. Alternatively, these speech programs are really that advanced.