jump to navigation

Google’s new Munich pact August 10, 2010

Posted by globalizer in global access.
Tags: , ,
add a comment

Nothing to add, this says it all.

Update: this says some more.

Advertisements

Let’s wipe TMX out July 20, 2010

Posted by globalizer in Tools, Translation.
add a comment

The boilerplate TMX blurb sounds great: Vendor-neutral, open, tool-independent, flexibility, future-proof, control over TM assets.

What’s not to like? Well, in theory nothing, but in practice almost everything.

The basic problem is that unless you use vendor-specific and tool-specific values for a number of elements, the content in the TMX will be almost useless. And if you use vendor-specific and tool-specific values, then the whole point of the standard is of course lost.

A couple of examples:

tuid

Translation unit identifier – Specifies an identifier for the <tu> element. Its value is not defined by the standard (it could be unique or not, numeric or alphanumeric, etc.).

and

Property – The element is used to define the various properties of the parent element (or of the document when
is used in the element). These properties are not defined by the standard.

As your tool is fully responsible for handling the content of an element you can use it in any way you wish. For example the content can be a list of instructions your tool can parse, not only a simple text.

name:domain value:Computer science Computer science

It is the responsibility of each tool provider to publish the types and values of the properties it uses. If the tool exports unpublished properties types, their values should begin with the prefix “x-“.

That’s peachy if both the sender and the receiver use the same tool, but in that case there is of course also no use for an open standard.

So in effect you can exchange TMs via TMX, but you lose extremely important parts of that TM in the process – such as IDs or keys tied to each translation unit (enabling proper handling of homonyms by enabling “exact exact matches”).

I assume this sorry state is part of the motivation for OpenTM2, given the statement about it being a reference implementation of TMX. It would also explain this exchange from one of the OpenTM2 steering committee meetings:

Helena: I think we need to wipe TMX out and use OpenTM2 to create a reference around which to write the standard. This will serve as proof of concept to use as development.
Michael A: Arle, where are we with TMX 2.0?
Arle: I see OpenTM2 as driving the next generation of TMX.
Let’s hope that next generation is not too far off…

More excellent opentm2 news July 19, 2010

Posted by globalizer in Tools, Translation.
add a comment

I apologize if I seem a little monomaniacal in my postings right now, but I really hope this thing takes off. It is truly pitiful to have to work with the most widely used tm tool when you have been used to Translation Manager.

The addition of the IBM markup tables for Java ResourceBundles, HTML, XHTML, XML, XLIFF and OpenDocument goes a long way towards filling the most obvious gaps.

I only tried the (UTF-8) JDK2 markup table, but that works as expected for Java properties files – well.

A brighter side of opentm2 July 5, 2010

Posted by globalizer in IBM, Tools, Translation.
Tags: ,
add a comment

Even the limited version currently available offers glimpses of the tool’s potential.

Just the other day I found that all new translations of an email template we had updated were broken – in at least 2 ways, actually, although here I’ll only deal with one of those.

The problem was with this sentence from the mail template:
Subscription to ${subscription.localizedObjectType}: ${subscription.objectTitle} (${subscription.notificationCount}#if ($subscription.notificationCount == 1) Update#else Updates#end) ${subscription.objectUrl}

And yes, I know full well that this Velocity jumble is not best practices for translatable text, so hold your horses. This is legacy code which we have to live with for a little while longer.

Anyway, this has been translated for a long time, and the update that caused us to have to re-translate was elsewhere in the file. So the translations of this particular sentence should not have changed.

Every single translate language came back with a completely incorrect translation for this string, however; in fact the singular version of the noun “Update” had in every single case been translated as the imperative form of the verb.

There seem to be 2 tool-related issues at play, segmentation and homonym handling. I’ll save homonym handling for later – even though this is probably my main beef with the widely used tool in question (which of course is Trados).

When queried about the weird translations, our vendor pointed to the tool as the culprit and provided this screen shot as proof of the segmentation causing the problem (combined with the homonym issue):

Bad segmentation

(Click for a larger version)

It does indeed look as if the tool treats each of the translatable words as a separate segment – which makes me wonder how it was possible to translate the sentence correctly the first time around and why a report about incorrect segmentation for this file wasn’t created, so that it could either be corrected or put on a list of files that can’t be handled via that tool. But these are all separate issues.

It did cause me to test whether OpenTM2 would be able to handle that file, and since HTML is one of the few file types with a markup table, I was in luck.

The plain vanilla HTML markup segmentation result  didn’t look too good, as expected, since all of the Velocity placeholder variables were left unprotected.

It proved very simple to add the required tags to the markup table, however, and after about 30 minutes I had “perfectly” segmented text. Perfect in the sense that all variables and code were protected (red text in the sample below), and sentences were treated as one segment (yellow background shows the active segment):

OpenTM2 segmentation

All changes that experienced software translators would be able to implement on their own.

OpenTM2 – yes, I was overly optimistic June 30, 2010

Posted by globalizer in IBM, Translation.
Tags: ,
18 comments

As I hinted here, my initial enthusiasm about the OpenTM2 project was misplaced. The version currently available is more or less useless, since it supports very few file formats.

One of the major strengths of the internal IBM Translation Manager is its support for an incredible variety of file formats – with no pre- or post-processing needed.

That last point is the key, since that is, IMHO, one of the weaknesses of other TM tools (the need to transform files, and the resulting errors).

File formats are supported via markup tables, and unfortunately only 4 file formats are supported as of now:

  • HTML
  • Plain text
  • Double quote files (translatable text contained in double quotes)
  • Single quote files (translatable text contained in single quotes)

And some of the included markup tables (Unicode encoding) do not actually work, since the required user exits are missing.

This problem would not be so bad if adding new markup tables were as easy as suggested by the Translators Reference Part 4 document (which, by the way, is an interesting title for a document containing mainly API references and code samples…):

You can create your own markup table by exporting an existing markup table in
external SGML format, modifying it with any text editor, and importing it back
into OpenTM2 under a different name…To become familiar with the content of markup tables you might want to export a
markup table and study it before you create a new markup table.

Piece of cake, right? Just edit an SGML file and import it! Yes, if you only need to add some additional tags, etc., but basically useless if you want to add support for an entirely new file format, since the basic syntax needs to be defined in a user exit, coded in C. Confirmed in one of the discussion threads on the support forum:

The latter one is coded
in “C” and it requires special skills to develop it. So I suggest that
you wait a little until the OpenTM2 documentation is fully available
and I’m sure there is a section describing the development of markup
tables in more details. The reason why I say this is that it is
essential that OpenTM2 “understand” all important file types
(OpenOffice, MS Office, XML, HTML, RTF, etc.). The actual set of
markup tables is only a very basic one and does not support the most
important files types on the market. So stay tune for more to
come …. 😉

So the “free solution” for freelancers hoped for by many is not there yet – and may never materialize, since localization companies with the resources available for developing markup tables (if they are out there) may choose to then make them part of commercial products rather than just contribute them to the open source project (Eclipse model).

Shooting myself in the foot? June 30, 2010

Posted by globalizer in Unicode.
add a comment

On those web sites that won’t let you register and therefore read their content unless you provide them with everything from your mother’s maiden name to your social security number I usually sign up as a male born in 1901. Just to give their marketing department a little to chew on when they try to tailor their targeted ads.

I am starting to think that may not be such a hot idea though, since my spam folder now seems to consist of approximately 90% viagra promotions, with a few feeble Nigerian-style scams thrown in.

So I wonder if the spammers really are that specific in their targeting – would I be getting spam about dieting if I had signed up as female? – or is this proportion really the norm these days?

I may have spoken too soon… June 29, 2010

Posted by globalizer in IBM, Translation.
Tags: ,
1 comment so far

Alas, the version that is made available seems to be severely limited in a number of respects.

And the documentation has… how shall I put it… issues.

More later when I figure out how usable the open source version is.

Finally, a dynamite translation memory tool available for all! June 28, 2010

Posted by globalizer in IBM, Translation.
Tags: ,
add a comment

I know I’m biased, since I used Translation Manager (or TM/2, as it used to be called) exclusively for more years than I care to remember. I also know that people whose opinion I hold in high regard agree with me in thinking that Translation Manager was a great translation memory tool compared to other products that shall remain nameless. Especially with respect to the handling of homonyms (correctly handling more than 1 translation of the same source term).

So it’s great news that IBM has decided to make OpenTM2 available as an open source project. I know that I will add a question about this to my vendor questionnaire right away.

Nothing to do with software globalization June 22, 2010

Posted by globalizer in Globalization.
add a comment

I really have no excuse to link to this, there’s just no way I can make this in any way related to software globalization. Wait, Krugman often writes about globalization, right?

Anyway, I have long ago learned to live by Brad DeLong’s rules, so all I can say is: read Paul Krugman

Wheee! June 8, 2010

Posted by globalizer in Unicode.
add a comment

It didn’t take long for the speed down the slide to pick up.

Remember the emoji encoding discussion back in 2008? Well, Unicode 6.0 containing the new “characters” only just came out in Beta, but as predicted, they are now being used as the justification for encoding – well, just about anything…

For instance, a proposal to encode “a portable interpretable object code into Unicode”:

> Creating new writing systems, directly embedding language,
> directly embedding mathematics or machine language–all of
> these are entirely outside of Unicode’s purview and WG2’s
> remit.  They simply will not be adopted.

Well, the emoji is a new writing system and that is being encoded. The encoding of the emoji has made me realize that the encoding of the portable interpretable object code is not an impossibility.

> Your enthusiasm may be commendable, but you’re spending
> your energy developing something which is not appropriate
> for inclusion within Unicode.

Thank you for your first remark, yet whether the portable interpretable object code is or is not appropriate for inclusion within Unicode is a matter that is not decided at this time.

There was a time when emoticons were not regarded as appropriate for inclusion in Unicode, yet they are now being encoded. That is an important precedent that what is appropriate depends upon the circumstances at the time, not on what was the policy previously.

Admittedly, the current proposal seems to be a solution in search of a problem. The author indicates that it

is intended to be a system to use to program software packages to solve problems of software globalization, particularly in relation to systems that use software to process text

but even though I work with software globalization on a daily basis, for the life of me I cannot think of something related to software globalization that:

  1. I want to do
  2. I cannot do with existing technology and standards
  3. This proposal will allow me to do

This specific proposal of course has a snowball’s chance in hell of being encoded, but the emoji argument will be a lot more difficult to counter once we get to something that is at least conceptually related to text. So hold on to your hats as the slide gets steeper and more slippery!

Oh, and by the way: for sheer entertainment value, the last couple of weeks’ worth of Unicode mail archives is priceless.