jump to navigation

Really, Chrome? September 24, 2011

Posted by globalizer in Language, QA, Translation, Unicode.
add a comment

With the terrible bloat in Firefox, I have recently been trying to get used to Chrome. I am having a really hard time, though. I appreciate the attempt to create an uncluttered interface, but please – within limits!

Once I finally managed to find the settings, I had to hunt around for ever to find the setting for default encoding. I first looked under Basics, but no luck. Went on to Advanced Options, and found no specific setting there. The Languages and Spell checker button seemed the most likely, but no, I didn’t find it there either.

Where does it hide? Under “Customize Fonts…”, of course.

If I were using the English language UI, I might actually have thought that location far-fetched, but not totally outlandish. Since I am using the Danish language version, however, the connection is just completely impossible to make:

The Danish button says “Customize font sizes…”.  And while it is true that good translations cannot be word-for-word translations, in this case my advice to the Danish translator would be to stay a little closer to the source text.

Another case where the Danish translation would benefit from a few changes:

The first sentence (Det ser ud som om, at du er flyttet”) is not wrong, but it would sound a lot more natural without the “at”. And the second one is just plain wrong – again, a superfluous “at”.  Otherwise the translation looks pretty good. And my main question is actually not about the translation at all; all of this has just been throat clearing leading up to this:

Why on earth is the out-of-the-box default encoding still set to ISO-8859-1 for all the Western European languages, and to various other legacy encodings for languages using other scripts? With Unicode (UTF-8) having surpassed 50% of the web by now?


Let’s wipe TMX out July 20, 2010

Posted by globalizer in Tools, Translation.
add a comment

The boilerplate TMX blurb sounds great: Vendor-neutral, open, tool-independent, flexibility, future-proof, control over TM assets.

What’s not to like? Well, in theory nothing, but in practice almost everything.

The basic problem is that unless you use vendor-specific and tool-specific values for a number of elements, the content in the TMX will be almost useless. And if you use vendor-specific and tool-specific values, then the whole point of the standard is of course lost.

A couple of examples:


Translation unit identifier – Specifies an identifier for the <tu> element. Its value is not defined by the standard (it could be unique or not, numeric or alphanumeric, etc.).


Property – The element is used to define the various properties of the parent element (or of the document when
is used in the element). These properties are not defined by the standard.

As your tool is fully responsible for handling the content of an element you can use it in any way you wish. For example the content can be a list of instructions your tool can parse, not only a simple text.

name:domain value:Computer science Computer science

It is the responsibility of each tool provider to publish the types and values of the properties it uses. If the tool exports unpublished properties types, their values should begin with the prefix “x-“.

That’s peachy if both the sender and the receiver use the same tool, but in that case there is of course also no use for an open standard.

So in effect you can exchange TMs via TMX, but you lose extremely important parts of that TM in the process – such as IDs or keys tied to each translation unit (enabling proper handling of homonyms by enabling “exact exact matches”).

I assume this sorry state is part of the motivation for OpenTM2, given the statement about it being a reference implementation of TMX. It would also explain this exchange from one of the OpenTM2 steering committee meetings:

Helena: I think we need to wipe TMX out and use OpenTM2 to create a reference around which to write the standard. This will serve as proof of concept to use as development.
Michael A: Arle, where are we with TMX 2.0?
Arle: I see OpenTM2 as driving the next generation of TMX.
Let’s hope that next generation is not too far off…

More excellent opentm2 news July 19, 2010

Posted by globalizer in Tools, Translation.
add a comment

I apologize if I seem a little monomaniacal in my postings right now, but I really hope this thing takes off. It is truly pitiful to have to work with the most widely used tm tool when you have been used to Translation Manager.

The addition of the IBM markup tables for Java ResourceBundles, HTML, XHTML, XML, XLIFF and OpenDocument goes a long way towards filling the most obvious gaps.

I only tried the (UTF-8) JDK2 markup table, but that works as expected for Java properties files – well.

A brighter side of opentm2 July 5, 2010

Posted by globalizer in IBM, Tools, Translation.
Tags: ,
add a comment

Even the limited version currently available offers glimpses of the tool’s potential.

Just the other day I found that all new translations of an email template we had updated were broken – in at least 2 ways, actually, although here I’ll only deal with one of those.

The problem was with this sentence from the mail template:
Subscription to ${subscription.localizedObjectType}: ${subscription.objectTitle} (${subscription.notificationCount}#if ($subscription.notificationCount == 1) Update#else Updates#end) ${subscription.objectUrl}

And yes, I know full well that this Velocity jumble is not best practices for translatable text, so hold your horses. This is legacy code which we have to live with for a little while longer.

Anyway, this has been translated for a long time, and the update that caused us to have to re-translate was elsewhere in the file. So the translations of this particular sentence should not have changed.

Every single translate language came back with a completely incorrect translation for this string, however; in fact the singular version of the noun “Update” had in every single case been translated as the imperative form of the verb.

There seem to be 2 tool-related issues at play, segmentation and homonym handling. I’ll save homonym handling for later – even though this is probably my main beef with the widely used tool in question (which of course is Trados).

When queried about the weird translations, our vendor pointed to the tool as the culprit and provided this screen shot as proof of the segmentation causing the problem (combined with the homonym issue):

Bad segmentation

(Click for a larger version)

It does indeed look as if the tool treats each of the translatable words as a separate segment – which makes me wonder how it was possible to translate the sentence correctly the first time around and why a report about incorrect segmentation for this file wasn’t created, so that it could either be corrected or put on a list of files that can’t be handled via that tool. But these are all separate issues.

It did cause me to test whether OpenTM2 would be able to handle that file, and since HTML is one of the few file types with a markup table, I was in luck.

The plain vanilla HTML markup segmentation result  didn’t look too good, as expected, since all of the Velocity placeholder variables were left unprotected.

It proved very simple to add the required tags to the markup table, however, and after about 30 minutes I had “perfectly” segmented text. Perfect in the sense that all variables and code were protected (red text in the sample below), and sentences were treated as one segment (yellow background shows the active segment):

OpenTM2 segmentation

All changes that experienced software translators would be able to implement on their own.

OpenTM2 – yes, I was overly optimistic June 30, 2010

Posted by globalizer in IBM, Translation.
Tags: ,

As I hinted here, my initial enthusiasm about the OpenTM2 project was misplaced. The version currently available is more or less useless, since it supports very few file formats.

One of the major strengths of the internal IBM Translation Manager is its support for an incredible variety of file formats – with no pre- or post-processing needed.

That last point is the key, since that is, IMHO, one of the weaknesses of other TM tools (the need to transform files, and the resulting errors).

File formats are supported via markup tables, and unfortunately only 4 file formats are supported as of now:

  • HTML
  • Plain text
  • Double quote files (translatable text contained in double quotes)
  • Single quote files (translatable text contained in single quotes)

And some of the included markup tables (Unicode encoding) do not actually work, since the required user exits are missing.

This problem would not be so bad if adding new markup tables were as easy as suggested by the Translators Reference Part 4 document (which, by the way, is an interesting title for a document containing mainly API references and code samples…):

You can create your own markup table by exporting an existing markup table in
external SGML format, modifying it with any text editor, and importing it back
into OpenTM2 under a different name…To become familiar with the content of markup tables you might want to export a
markup table and study it before you create a new markup table.

Piece of cake, right? Just edit an SGML file and import it! Yes, if you only need to add some additional tags, etc., but basically useless if you want to add support for an entirely new file format, since the basic syntax needs to be defined in a user exit, coded in C. Confirmed in one of the discussion threads on the support forum:

The latter one is coded
in “C” and it requires special skills to develop it. So I suggest that
you wait a little until the OpenTM2 documentation is fully available
and I’m sure there is a section describing the development of markup
tables in more details. The reason why I say this is that it is
essential that OpenTM2 “understand” all important file types
(OpenOffice, MS Office, XML, HTML, RTF, etc.). The actual set of
markup tables is only a very basic one and does not support the most
important files types on the market. So stay tune for more to
come …. 😉

So the “free solution” for freelancers hoped for by many is not there yet – and may never materialize, since localization companies with the resources available for developing markup tables (if they are out there) may choose to then make them part of commercial products rather than just contribute them to the open source project (Eclipse model).

I may have spoken too soon… June 29, 2010

Posted by globalizer in IBM, Translation.
Tags: ,
1 comment so far

Alas, the version that is made available seems to be severely limited in a number of respects.

And the documentation has… how shall I put it… issues.

More later when I figure out how usable the open source version is.

Finally, a dynamite translation memory tool available for all! June 28, 2010

Posted by globalizer in IBM, Translation.
Tags: ,
add a comment

I know I’m biased, since I used Translation Manager (or TM/2, as it used to be called) exclusively for more years than I care to remember. I also know that people whose opinion I hold in high regard agree with me in thinking that Translation Manager was a great translation memory tool compared to other products that shall remain nameless. Especially with respect to the handling of homonyms (correctly handling more than 1 translation of the same source term).

So it’s great news that IBM has decided to make OpenTM2 available as an open source project. I know that I will add a question about this to my vendor questionnaire right away.

Those compound English nouns December 17, 2008

Posted by globalizer in Translation.
add a comment

One of the more amusing examples of this problem: somebody is looking for a Handheld Software Engineer

That’s gotta be one giant hand – or a really tiny engineer…

The myth exploded – again November 25, 2008

Posted by globalizer in global access, Language, Translation.
1 comment so far

Thank you, thank you, thank you!

Renato Beninatto from Common Sense Advisory posts about the consensus expressed by Localization World attendees from ten Latin American countries:

Spanish is one language. While there are a few terminology areas with words that vary a lot from country to country — like culinary and apparel terms — native speakers agree that by now everybody is used to the Microsoft Spanish standard for software user interfaces. In the vast majority of cases, two or more versions of Spanish are not required, except for marketing purposes.

This is one of the most persistent myths that lives on in the halls of software companies – that you absolutely have to produce at least two Spanish language versions of your software, one for Spain, and one for Latin America (and in the extreme version, this morphs into a need for separate versions for each Spanish-speaking country in Latin America).

I don’t know have many times I have had to swat away at this.

Update: I should add that there are real pitfalls you have to watch out for, if you produce only one Spanish language version worldwide. Certain Spanish terms have such connotations in various Latin American countries that you do not want them to appear in your software, so you have to create “stop lists” with those terms. This does add a small wrinkle to the work of Spanish translators.

Touching faith in machine translation July 2, 2008

Posted by globalizer in Crazy stories, Translation.
add a comment

Words fail me.
Both at the fact that the machine translation error message itself seems to be machine translated, and at the fact that there is apparently zero involvement from a sentient being before a giant message like that is posted.