jump to navigation

Really, Chrome? September 24, 2011

Posted by globalizer in Language, QA, Translation, Unicode.
add a comment

With the terrible bloat in Firefox, I have recently been trying to get used to Chrome. I am having a really hard time, though. I appreciate the attempt to create an uncluttered interface, but please – within limits!

Once I finally managed to find the settings, I had to hunt around for ever to find the setting for default encoding. I first looked under Basics, but no luck. Went on to Advanced Options, and found no specific setting there. The Languages and Spell checker button seemed the most likely, but no, I didn’t find it there either.

Where does it hide? Under “Customize Fonts…”, of course.

If I were using the English language UI, I might actually have thought that location far-fetched, but not totally outlandish. Since I am using the Danish language version, however, the connection is just completely impossible to make:

The Danish button says “Customize font sizes…”.  And while it is true that good translations cannot be word-for-word translations, in this case my advice to the Danish translator would be to stay a little closer to the source text.

Another case where the Danish translation would benefit from a few changes:

The first sentence (Det ser ud som om, at du er flyttet”) is not wrong, but it would sound a lot more natural without the “at”. And the second one is just plain wrong – again, a superfluous “at”.  Otherwise the translation looks pretty good. And my main question is actually not about the translation at all; all of this has just been throat clearing leading up to this:

Why on earth is the out-of-the-box default encoding still set to ISO-8859-1 for all the Western European languages, and to various other legacy encodings for languages using other scripts? With Unicode (UTF-8) having surpassed 50% of the web by now?

Time for a style guide update, NYT January 16, 2010

Posted by globalizer in Language, terminology.
Tags: ,
add a comment

Something that has been bugging me for a while, this is just the latest example:

She said she was “unaware” whether China staff have been denied access to codes, as some bloggers have said, but added that Google is still scanning its systems following the attack.

Who, outside of the New York Times editorial offices, thinks that “code” as in “software code” should be treated as a countable noun?

We need an official language, now! February 10, 2009

Posted by globalizer in Language, Silly stuff.
add a comment

I just came back from one of the county recycling stations, where I had an interesting conversation with the guy on duty. I think it was about the economic crisis, the fact that it’s even worse than the Great Depression because young people today have no idea how to at least get food on the table even though they have no money (by fishing and hunting), that contractors in the area have absolutely no jobs lined up whatsoever, and that they have been foolish to not save any money when the going was good.

I say I think that’s what it was about . I think I heard snatches such as “Hoover’s days”, “young people”, “contractors”, but the North Carolina dialect was so strong, with an overlay of mumbling, that it might have been about quantum physics, for all I know.

Even though it was a bit uncomfortable (who knows what I was actually agreeing with, all those times when I nodded and said “uh huh” or “yeah”), the experience did provide me with an epiphany.

All those proposals about English as an official language, they have not gone nearly far enough. We need not just an official language, we need a language spoken in such a way that people can actually understand it. Combine that thought with the economic crisis, and think of the possibilities:

  • we will need countless language teachers (who won’t need a little refresher course in either grammar or pronunciation?) , so English majors will suddenly be in short supply
  • we will need an army of officials who can administer official language tests and certify people, so this will create a huge number of new jobs
  • this economic stimulus will eliminate political opposition from right wing Republicans – even those who seem to think that government jobs are not real jobs would have to support it

In short, I have the solution to the deadlock over the economic stimulus package: a spending program which in one fell swoop will garner support both from the “government-expanding, latte-drinking, sushi-eating, Volvo-driving, New York Times-reading” liberals because of the spending aspects and from the “gun-toting, bible thumping bitter wingnuts” because of the support for one of their pet projects.

You can thank me later, President Obama 🙂

Update: OK, just to make sure: y’all do realize this is tongue-in-cheek, right?

Install anywhere yes – but prepare for a slightly bumpy translation ride December 6, 2008

Posted by globalizer in Language, Locales, Localization, Programming practices.
add a comment

First, the good stuff: InstallAnywhere is actually a nifty product which allows you to create a good install program in almost no time. Kudos to the team behind it for that.

Now for the not-so-good stuff: translation

Here I am not referring to the translations provided by InstallAnywhere out of the box. They provide very good language coverage (I count 31 languages/language variants in the 2008 VP1 Enterprise Edition, including English), and the quality of the translations themselves also seems fine in this version (some of the early InstallShield translations into certain languages were rather unfortunate). [1]. 

So, no complaints there. The trouble starts when you need to modify or customize anything related to the translations.

First problem

All new/updated text strings that will need to be translated are inserted in the custom_en file which already contains all the out of the box translation strings. There is no option to choose a separate translation file for custom strings. This means that anybody using a modern translation tool with translation memory features will have to re-translate the entire InstallAnywhere GUI even if they only modify a single string (because such tools use the English source file as the starting point for all translation). Cut-and-paste from the existing translation files can make that job faster, or you may be able to use the feature of creating a translation memory based on a set of source and target files described below, but no matter what method you choose, there will be a significant workload involved.

InstallAnywhere does update all the language versions of the custom_xx files along with the English version, with the difference that only the comment line for each string is updated in the translated versions (the custom_en file contains a comment line with a copy of each translation string). After an update, the English and Danish versions look like this respectively:

custom_en:
# ChooseInstallSetAction.368876699cf1.bundlesTitle=Choose Install Setشقشلاهؤ
ChooseInstallSetAction.368876699cf1.bundlesTitle=Choose Install Setشقشلاهؤ

custom_da:
# ChooseInstallSetAction.368876699cf1.bundlesTitle=Choose Install Setشقشلاهؤ
ChooseInstallSetAction.368876699cf1.bundlesTitle=Vælg installationssæt

This seems to indicate that the designers of the product have not understood how modern translation tools work. Indeed, the detailed help indicates that the creators of the application assume the translated versions will all be created/edited via the IA designer. This feature would have been extremely useful 20 years ago, since it retains existing translations and allows translators to just go through the translated file and modify the strings where they see an update. Today it is a terrible hindrance (except for translators still working without modern tools), however.

With today’s tools translators who need to bring a translated file up to the same level as an updated English file simply take the new English source file and run it through their translation memory tool. That tool automatically translates any unchanged strings and presents the translator with just those strings that are either new or changes. For this to work the translation memory has to contain the unchanged strings, of course, and that is where the InstallAnywhere model breaks down. With some tools it is possible to create “fake” translation memories on the basis of existing source and target files, but it is a rather time-consuming process, and by no means error-free.

The easy fix would be to at least make it an option to store any customized strings in a separate translation file. Since InstallAnywhere allows users to change existing strings, this of course introduces the question of what to do with the existing strings in the custom_en and translated versions of that file.

I believe the best solution would be to delete such strings from the custom_en file and the translated versions (in other words, those files would only contain the strings that were unchanged from the out of the box version). The changed strings would instead be inserted in the “new” translation file.

Second problem – probably a minor one

There does not seem to really be an option for adding a language that is not in the list of languages provided out of the box. At least I don’t see it in the designer, and I haven’t found any information in the knowledgebase, fora, etc. With the number of languages supported, this may not be a major issue, but it would be nice to have the option.

[1]
Some of those early versions had eerily bad Danish and Norwegian translations which looked like an amalgam of Danish, Norwegian and Swedish. This old thread from an InstallShield forum may shed some light on how that happened (note also how the InstallShield support guy keeps suggesting that a Dutch version be used, when the user is looking for Danish…). But, as noted above, the current translations seem fine. I looked at the Danish version (the only one I am really competent to judge), and I have no complaints whatsoever, so I believe that the early problems have been overcome completely.

The myth exploded – again November 25, 2008

Posted by globalizer in global access, Language, Translation.
1 comment so far

Thank you, thank you, thank you!

Renato Beninatto from Common Sense Advisory posts about the consensus expressed by Localization World attendees from ten Latin American countries:

Spanish is one language. While there are a few terminology areas with words that vary a lot from country to country — like culinary and apparel terms — native speakers agree that by now everybody is used to the Microsoft Spanish standard for software user interfaces. In the vast majority of cases, two or more versions of Spanish are not required, except for marketing purposes.

This is one of the most persistent myths that lives on in the halls of software companies – that you absolutely have to produce at least two Spanish language versions of your software, one for Spain, and one for Latin America (and in the extreme version, this morphs into a need for separate versions for each Spanish-speaking country in Latin America).

I don’t know have many times I have had to swat away at this.

Update: I should add that there are real pitfalls you have to watch out for, if you produce only one Spanish language version worldwide. Certain Spanish terms have such connotations in various Latin American countries that you do not want them to appear in your software, so you have to create “stop lists” with those terms. This does add a small wrinkle to the work of Spanish translators.

When the search engine tries to read your mind – and fails miserably July 1, 2008

Posted by globalizer in Language, Silly stuff.
add a comment

Everybody has a favorite example of Google coming up with suggestions for alternative searches – “Did you mean *****?” it politely asks when your search resembles something other people have searched for, and sometimes those suggestions are wildly off base, of course.

The search engine we use on the IBM intranet has a similar feature, and the alternative suggestions are usually reasonably close to the original search. This one has me stumped, though:

I searched for

issi update flash player fails

(because an automated software update keeps failing on my machine, and I wanted to see if it was a known problem).

The alternative suggestion was:

Did you mean: Standard Software Installer, blink?

I get the “Standard Software Installer” suggestion for “issi”, since that’s what the acronym stands for, but ‘blink’ for the rest?? Maybe the algorithm has ‘blink’ listed as a synonym for ‘flash’, but still – a little optimistic to think that I simply stuck the remaining 3 words in there as filler.

One laptop per child November 14, 2007

Posted by globalizer in global access, Language, Localization.
add a comment

Just a quick link to the olpc wiki to highlight that the core languages for the project are quite a bit different from the sets of languages that mainstream software localization usually targets, simply because they aim at providing coverage in developing countries.

I am sure that a few of those languages will pose some interesting challenges.

Unfortunately the wiki does not provide a very clear picture of the status of the various languages – exactly how much of the user interface is currently translated, for instance.

I can has Danish LOLCODE? July 28, 2007

Posted by globalizer in Danish, Language, Programming languages.
add a comment

I for one am a cat person and subscribe to the notion that there can never be too many cute kittens on the web. Since the lolcat or “cat macro” phenomenon combines cats and language, what’s not to love about it? Add a programming angle, and you get LOLCODE, an entirely new programming language (which has already spawned an Eclipse plug-in, for instance).

However, on behalf of the software localization community I have to ask:

Where is the internationalization support??

We need the ability to externalize and translate the text in code like this:

HAI

CAN HAS STDIO?

VISIBLE "HAI WORLD!"

KTHXBYE

A modern programming language really can’t afford to not include i18n and l10n as part of the basic architecture, you know 🙂

Ouch! July 23, 2007

Posted by globalizer in Language.
add a comment

Nothing about software here – just plain old proof reading and common sense:

Why is it that headline writers love to create headlines like these:

Mindy McCready charged with battery

When I first saw that link on CNN I just couldn’t figure out why anybody would want to attach themselves to a battery and then be get an electric charge from it (or was it maybe somebody else who attached the battery?)…

The wording above is from the link on the main page the headline for the actual story makes it a little bit clearer that we are talking police blotter, not some kind of weird new bionic development.

PluralFormat to the rescue? July 2, 2007

Posted by globalizer in Java, Language, Localization, Translation.
4 comments

OK, I have been missing in action for a while, I know. A new job (still in IBM, mind you) and a fairly long vacation are my only excuses. Back to business:

Here and here I complained about the localization issues involved in using ChoiceFormat. One of those issues would seem to be addressed by the new PluralRules and PluralFormat API proposal described on the ICU design mailing list recently. PluralRules would allow you to define plural cases for each language, and the numbers those plural cases apply to, while PluralFormat would then allow you to provide message text for each such case. This format would thus be able to handle languages like Russian and Polish, which use more complex plural rules than the ones that can be provided via the simple intervals of ChoiceFormat.

It is of course a step forward that the API will now allow you to actually define something that will work for (all?) languages. As far as I can see we will actually take a step backward with respect to the other problem, however: the format will be even more difficult to handle for translators.

According to the API proposal,

It provides predefined plural rules for many locales. Thus, the programmer need not worry about the plural cases of a language. On the flip side, the localizer does not have to specify the plural cases; he can simply use the predefined keywords. The whole plural formatting of messages can be done using localized patterns from resource bundles.

If this is really true, then the programmer will write a resource bundle that implements the US English keywords (in most cases, anyway), and it will be up to the localizer to know the PluralRules keywords that are defined for her language, and to implement them correctly in the localized resource bundle.

This comment on the mailing list to the proposal would seem to be an understatement:

Separating the rules from Plural Format helps some here, but translators will still have to be able to write the PluralFormat syntax, which is about as complicated as the ChoiceFormat syntax.

I think my ChoiceFormat advice will extend to the new API for the time being: don’t use it