jump to navigation

Vindication for the President’s internets? April 6, 2007

Posted by globalizer in Language, terminology.
add a comment

And here we’ve all been making fun of Mr. Bush for his “internets” remark:
It seems he’s in good company 🙂

These protocols were obsoleted by:

  • RFC 1155 — Structure and identification of management information for TCP/IP-based internets
  • RFC 1156 — Management information base for network management of TCP/IP-based internets

[emphasis added]

I know, I know – there’s a difference between “the Internets” and “internets” as used in the RFC context – but still…


Freudian slip in the Washington Post? March 31, 2007

Posted by globalizer in Language, terminology.
add a comment

I wonder how long this little gem will be allowed to stay up in the online version of the Washington Post:

The case against Kerik that federal prosecutors are preparing could generate uncomfortable political attention for Giuliani because it focuses on Kerik’s activities while the two men were in government together and were jointly running Giuliani-Kerik, which was paid millions of dollars for advising upstart companies, doing federal work and consulting with clients overseas (emphasis added).

Here are the results of a Google search for definitions of upstart; here on the other hand, are the definitions of startup.

So you think that charset= value is going to help you? March 8, 2007

Posted by globalizer in encoding, Language, Localization, web applications.
add a comment

I just ran across a real-life example of the pitfalls involved in relying on the charset value in HTML pages to tell you what the encoding is.

It is not unreasonable, of course, to assume that the encoding of the text actually corresponds to the value specified in that tag, it is just not very realistic.

Both Mark Davis from Google and Addison Phillips from Yahoo highlighted the fact that so many pages are either untagged or mistagged in their presentations at the most recent Unicode conference.

A recent question on Sun’s Java i18n forum about an “incorrect” conversion from gb2312 into Unicode made me suspicious, and wouldn’t you know it, the culprit was an incorrect charset value. This Chinese page is tagged as being encoded in gb2312, but it seems to really be in GBK. What makes this slightly tricky is that in a text using everyday Chinese the differences between those two encodings would be minimal – very few characters would be in GBK that would not be in gb2312. So a scan to detect the actual encoding might not even have made any difference.

Dutch, Danish, same difference February 16, 2007

Posted by globalizer in Danish, Denmark, Language.

Later update, October 11, 2009:

I just found one new, potential factor contributing to the confusion. I see that somebody found this blog post by searching for “are windmills dutch or danish”. And since the Dutch are famous for their historical windmills, while a Danish company is now the world’s number one  maker of wind turbines, and Denmark today generates about 20% of its electricity with wind turbines, I can see how that could be a cause for some confusion. Given that a lot of people are confused to begin with.

Update, October 19, 2007:

As a public service announcement to the many people who seem to find this post after doing searches like “Is Dutch and Danish the same language” and “difference between Dutch and Danish”:

No, those are two completely separate and different languages.

Dutch is spoken in the Netherlands (or Holland), while Danish is spoken in Denmark. They are both Germanic languages, and they are both spoken in small, flat countries located in the Northwestern part of Europe – but they are still different. And while speakers of the Scandinavian languages (Danish, Norwegian and Swedish) are usually able to understand each other (with good will exhibited by both speaker and listener), Danish and Dutch are not closely enough related for that to be the case for those two languages. Single words may resemble each other, etc., but that’s about it.

Original post:

Just yesterday I was reminded once again how difficult people from the US find it to distinguish between Dutch and Danish. As I joined a conference call, a Danish colleague and I were initially the only ones on, so we conversed happily in Danish until a US colleague joined, at which point I said something like “sorry for babbling in Danish”. To which he responded, as expected: “Oh, that’s the language spoken in the Netherlands, right?”

I can’t count the number of times something like this has happened to me – or the other way around, of course, having people think Dutch is spoken in Denmark. A few years ago when I was on an extended translation verification test along with testers from about 15 other countries, our daily meetings with the development team became the source of a running bet between the Dutch tester and me about how many times his and my defects would be confused.

I have never quite understood the reason for this almost total inability to distinguish between the two language names; after all, the two words are not that similar, and the country names are totally different.

But there definitely seems to be some kind of blind spot with respect to the “D-word” in this context.

I would put those blankets in a safe January 28, 2007

Posted by globalizer in Danish, Language, Translation.
add a comment

Warning: nothing technical about software globalization or localization in this post.

From a short human interest story about a guide horse in today’s Politiken.dk:

Når Panda har fri fra arbejde kan hun godt lide at putte, nappe i Edies guldtæpper eller lege med legetøj.

Approximate translation (although the use of the word “putte” in this connection is rather unclear – there is no context, so it could mean any number of things, including that the horse likes to play golf, and particularly to put…):

When Panda is not working she likes to snuggle, nibble on Edie’s gold blankets or play with her toys.

I think it’s very tolerant of Edie to let her guide horse chew on her gold blankets – I would probably store such items in a safe and instead put some carpets on the floor 🙂

Hint for people who don’t speak Danish: “carpets” in Danish is “gulvtæpper”, and it is pronounced without the ‘v’ (it’s one of those very common silent letters in Danish), so a not-too-competent speller could easily be mislead into using a ‘d’ instead, since ‘d’ in this position in a word is also often silent. And the ‘d’ in ‘guld’ (“gold” in English) is indeed silent – but, the pronunciation differs from the pronunciation of ‘gulv’ (“floor” in English).


OK, in addition to being shaky when it comes to Danish spelling, Nina Cederberg (whose byline is on the story quoted above) is also not too familiar with the English language. I now found the same AP story on CNN, and here’s the corresponding sentence in the original English:

At home, where she’s not working, Panda snuggles, naps on a carpet or plays with toys.

To translate naps into nappe in Danish (which, as my suggested translation shows, has ‘nibble’ as one of the many possible meanings in English, but which definitely cannot mean ‘to nap’) is a rather egregious example of unintended undersættelse.

So this post did actually end up being about translation after all…

Not merely prescriptivism for prescriptivism’s sake January 19, 2007

Posted by globalizer in Danish, Language, Localization, Translation.
add a comment

I should probably clarify that when I complained about the non-standard construction of 2- and 3-word noun phrases in the Danish Netvibes translation, my main criticism was based on the potential for misunderstanding, not on the formal violation of “rules”.

Don’t get me wrong. I do believe that it makes good economic sense to establish and follow linguistic guidelines that give your applications and corporate web interface a polished, professional look, since inconsistent and non-standard language usage may impact your users’ opinion of your product quality in general.

However, there’s a big difference between simply non-standard grammar, for instance, and grammar or orthography that introduces ambiguity or downright misunderstandings. For instance, in my daily work it is not unusual to hear people consistently conjugate irregular verbs in English like this:

We have went over this a number of times…

This always makes me wince (to say nothing of the by now almost ubiquitous “Send your responses back to John and I” and similar constructions), but I have no problem understanding what is meant.

The same cannot be said about the non-standard way of constructing noun phrases in Danish (separating each word with spaces instead of concatenating them into one word), however. In many cases this new practice (heavily influenced by English) totally changes the meaning of the phrase.

So at least in those specific cases there are good reasons beyond taste for nitpicking.

More about what’s wrong with ChoiceFormat January 16, 2007

Posted by globalizer in Java, Language, Localization, Translation.
1 comment so far

Here I complained about ChoiceFormat and its limitations when it comes to localization. I also said I would try to suggest some alternatives, but I’m not quite there yet. This is just a small update to mention why many developers are not aware of the limitations and indeed think that they are doing everything right from an internationalization perspective:

Most tutorials and internationalization guidelines use examples from just English (as the source language) and maybe French or German as target languages, demonstrating how it is possible to create something that will work for closely related languages. They make no mention of the problems localizers may have with parsing the format, and they completely ignore potential problems with other types of languages. The Sun Java tutorial on internationalization is one example, this tech tip is another one.

A least one case where Danish is easier to disambiguate than English January 13, 2007

Posted by globalizer in Danish, Language, Localization, Translation.
add a comment

In connection with my review of the Danish version of Netvibes I mentioned how noun phrases consisting of more than one noun are constructed by concatenating the nouns into one word – and how violating that rule can result in completely altering the meaning of the phrase.

Today Language Log conveniently demonstrates how for phrases like “basil leaves” or “fruit flies” there is no way to indicate which meaning you intend in English (are “leaves” and “flies” verbs or nouns?); while in Danish every single one of the two-word noun phrases would simply be written in one word (assuming you use the standard rules), leaving no room for doubt.

Crowd-translations (or community translations) revisited January 7, 2007

Posted by globalizer in Danish, Language, Localization, QA, Translation.
add a comment

Back here I posted about the advantages and drawbacks of using volunteer translators for your projects. I used Netvibes to illustrate some of the potential pitfalls associated with this approach, and I should note that I did send a feedback from the Netvibes web site to make them aware of my posting. I have not heard back, and the Danish translation has not changed – I will be keeping an eye on it to see if they react at all.

I also wanted to add a consideration I neglected to mention in my original post:

Since most Danes know English well enough to be able to use a site like Netvibes in that language (even more true for the segment of the population that is liable to use such a site than for the population in general), I suspect that they will be more intolerant of poor translations than people from countries with lower levels of competency in English. If it’s a choice between having access to a service with some linguistic warts and not being able to use the service, then you are probably not going to complain too much about the warts. If, on the other hand, you have a choice between using an English version and a Danish version with obvious cosmetic problems, I believe a fair number of Danish users will gravitate towards the English version.

That would not be a big deal if it didn’t also result in a more negative perception of the web site and the company behind it. I would love to see some empirical investigation of this question, along the lines of this one from Common Sense Advisory. That survey showed a preference for native language in a number of countries, but to answer my question the following parameters would have to be added:

  • survey a country like Denmark or the Netherlands, where a larger proportion of the population understands English at the level required to be comfortable using web sites in that language
  • ask about translation quality

The pros and cons of community translations January 1, 2007

Posted by globalizer in Danish, Language, Localization, QA, Translation.

I just found Netvibes over the Christmas holidays (I know, I know – hopelessly behind the cutting edge…). That prompted me to dig out this draft on community translation that I have had sitting around for a while. And I want to say up front that while I use Netvibes as an example of not quite successful localization into Danish, the site and its services seem very useful at first glance.

Alongside the increasing popularity of open source projects has come an increase in community translation projects – software translation performed not by paid translation professionals, but by volunteers. And this process has been used not just by open source software projects, but also by “regular” closed source projects and by well-known companies such as Google.

This raises an interesting question: why don’t all software companies take advantage of such “free translations”?

There are probably a number of answers, (more…)