jump to navigation

Really, Chrome? September 24, 2011

Posted by globalizer in Language, QA, Translation, Unicode.
add a comment

With the terrible bloat in Firefox, I have recently been trying to get used to Chrome. I am having a really hard time, though. I appreciate the attempt to create an uncluttered interface, but please – within limits!

Once I finally managed to find the settings, I had to hunt around for ever to find the setting for default encoding. I first looked under Basics, but no luck. Went on to Advanced Options, and found no specific setting there. The Languages and Spell checker button seemed the most likely, but no, I didn’t find it there either.

Where does it hide? Under “Customize Fonts…”, of course.

If I were using the English language UI, I might actually have thought that location far-fetched, but not totally outlandish. Since I am using the Danish language version, however, the connection is just completely impossible to make:

The Danish button says “Customize font sizes…”.  And while it is true that good translations cannot be word-for-word translations, in this case my advice to the Danish translator would be to stay a little closer to the source text.

Another case where the Danish translation would benefit from a few changes:

The first sentence (Det ser ud som om, at du er flyttet”) is not wrong, but it would sound a lot more natural without the “at”. And the second one is just plain wrong – again, a superfluous “at”.  Otherwise the translation looks pretty good. And my main question is actually not about the translation at all; all of this has just been throat clearing leading up to this:

Why on earth is the out-of-the-box default encoding still set to ISO-8859-1 for all the Western European languages, and to various other legacy encodings for languages using other scripts? With Unicode (UTF-8) having surpassed 50% of the web by now?

A little attention to detail, guys? August 12, 2011

Posted by globalizer in China, Crazy stories, Silly stuff.
add a comment

Come on, interpolspecialagent, couldn’t you at least spoof your email address? Do you seriously expect to fool anybody with this contact information:

INTERPOL SPECIAL INVESTIGATION AGENT.
MR. SCOTT L. EVERSON & ASSOCIATE
ADDRESS: Marsham Street 2. SW1P 4DF. London. United Kingdom
Email: interpolspecialagent@yahoo.cn

When you went to the effort of coming up with a reasonably sounding UK address, why spoil it with a “yahoo.cn” email address?

I’ll admit that pretending to be investigating a Nigerian scam, and to warn against communicating with anybody else to avoid being scammed, is somewhat inventive:

Last year, a meeting was held with the General Director of the Interpol and some other top officials in the United Kingdom concerning the online internet scam from Nigeria and so many other countries, in the conclusion of the meeting, I was ordered by the Interpol to fly down to Malaysia for special investigation concerning the delay of your payment, because in our record file, Malaysia was the country where your funds payment was channel to for a very long time.

You need to understand that my coming down to Malaysia is because of your funds transfer, and I have to accomplish the transfer before returning back to the United Kingdom, all the legal documentation for your funds are with me here in Malaysia, what i just need from you now is your corporation, you have to seize communication with any other person different from me to avoid been mislead.

Don’t allow anyone to deceive you, your funds is $10,000,000.00 (Ten Million United States Dollars Only) it was written inside the recording files of your funds.

Upon the receipt of this information, I will email you or call you and give you code on how to communicate with me and I will always keep you updated concerning the progress of the funds transfer, once again you are advised to seize all communication with any other office or person to avoid been mislead, and whenever you receive any message from anybody talking about your funds, kindly forward it to me so that i can make a proper investigation on it.

But then you go and spoil it again by signing off with this:

Yours Faithfully
Interpol Scott Everson
Special Investigation Agent, United Kingdom

“Interpol Scott Everson” – what a disappointing finish to an otherwise fairly entertaining effort.

Et tu, New York Times July 1, 2011

Posted by globalizer in Unicode.
add a comment

Hmm, the Grey Lady is slipping. These are the current front page blurbs for articles inside:

Click to enhance, and you will see that the opinion piece, 4th from the left, is about a Dutch company hiring autistic workers. Unfortunately, this is the article in question, showing that not even the NYT is immune to the Dutch/Danish confusion

Do Canadians have bigger pockets? April 14, 2011

Posted by globalizer in Unicode.
add a comment

Or does David Pogue just wear suits with particularly narrow pockets? He seems to think that the PlayBook is about half an inch too wide to fit into the breast pocket of a jacket – and that

Whoever muffed that design spec should be barred from the launch party.

It’s interesting that he focused on that particular point, since a colleague of mine told me that this was exactly what sold a Canadian customer on the PlayBook beta he was comparing to Xooms and iPads a couple of days ago: it just fit into his pocket.

I can’t say I am familiar with the finer points of men’s wear, so I am left to wonder if there is a worldwide standard for the size of suit pockets? Or are there regional differences?

Calling Tex Texin, we need some research into the matter of pocket sizes!

“Security” run amuck – again February 21, 2011

Posted by globalizer in Unicode.
add a comment

The “security questions” that US web sites seem to think are a great security feature have now officially jumped the shark. A run of the mill online shopping site, vitacost.com, not only requires you to create an account before you can place an order (why?), they also require you to pick two “security questions” as part of the account setup.

They obviously haven’t quite understood what the purpose of the security questions is, however – they actually mask the input in the entry fields for the answers, and require you to “confirm” the answer, thus treating them exactly like password fields. So I now have 3 passwords for a web site I will obviously never use again, since it is way too much hassle. And if I were planning on using it again, I would of course have to write all this wonderful information down somewhere, since there’s no way I would remember it 3 days from now. This is obviously very secure…

And just to explain: I have to write the answers to the “security questions” down because the questions are always about something I have no real answers for: my mother’s middle name, the name of my high school mascot, etc.

Silverowlcreations customer service ftw December 11, 2010

Posted by globalizer in Unicode.
add a comment

I am completely floored by the customer service I just received from Silver Owl Creations. Not only did I receive my order lightning fast, but I also received a refund on the shipping (which was extremely reasonable to begin with), with this explanation:

I’ve given you a slight refund on the shipping charge to make it match the actual cost of packaging and shipping.

That’s almost enough to bring tears to your eyes.

On top of that, the pieces are beautiful; I have a new appreciation for steampunk (literally, since I didn’t know that the concept existed until 2 days ago). Highly recommended.

Full Unicode repertoire in programming languages? October 29, 2010

Posted by globalizer in Programming languages, Programming practices, Unicode.
add a comment

Would we be better off if we used a programming language that allowed “the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as “Dentistry symbol light down and horizontal with wave” (0x23c7).” ?

I am usually as gung-ho about Unicode as you can get, but I have to admit I’m a little wary about this. Mind you, it would presumably spur the adoption of UTF-8 as the default encoding in development environments on all platforms, something that’s long overdue. How can MacRoman still be the default encoding for text files in Eclipse on Macs??

Via Computerworld.

More i18n pot holes to be filled by Google October 26, 2010

Posted by globalizer in Internationalization, JavaScript, Locales, Programming languages.
3 comments

Yesterday I mentioned the big step forward that Google’s open-sourcing of phone and address libraries represents. For Java these libraries fill a major hole in an otherwise fairly smooth road, while for JavaScript they could be seen as the only smooth part of an otherwise almost completely unpaved i18n road.

As Ćirić and Shin, the authors of Google’s proposed i18n enhancements for JavaScript, (charitably) put it,

current EcmaScript I18N APIs do not meet the needs of I18N sufficiently

This has obviously been a problem for a very long time, but until Ajax gave JavaScript a second chance and the web browser became the dominant application delivery system, nobody thought it awful enough to fix properly. I remember discussions about this issue back in the 1990s, and at that time nobody in IBM was using JavaScript enough to squawk about the lack of support.

Well, things change. And with Google now being a serious player in the browser market, they seem to have found it important enough to propose a set of i18n APIs that would provide JavaScript with support similar to that found in languages like Java, covering

  • Locale support
  • Collation
  • Timezone handling
  • Number, date and time formatting
  • Number, date and time parsing
  • Message formatting

The proposal calls for using native data sources (ICU, glibc, native Windows calls), mainly because of the size of some of the data tables needed for collation, for instance. While not optimal, understandable.

The proposed message formatting is another variation of the plural and gender formatting capabilities that is all the craze these days. People who have read my previous posts on this topic will know that I am no fan of this type of formatting, and my most recent experiences with email templates using plural formatting have not changed my view. Exposing stuff like this in translatable files is just utter folly, IMHO:

var pattern = '{WHO} invited {NUM_PEOPLE, plural, offset:1 other {{PERSON} and # other people}} to {GENDER, select, female {her circle} other {his circle}}'

I did hear support for this viewpoint at IUC34, and the suggestion that these strings should not be exposed in the translation files – instead, those files should contain the full set of “expanded” string variations (male/female pronouns, singular/plural cases).

But if that is the goal, I see very little point in using the message formatters in the first place. I guess it forces the developer to think about the variations, and it would keep the strings co-located in the translation files, but that’s about all.

That’s nitpicking, however, considering the huge step forward this would represent, with an experimental implementation targeted for Q4.

Kudos to Google: filling huge i18n gap October 25, 2010

Posted by globalizer in Android, Internationalization, Java, Locales.
2 comments

I’ve been a little harsh on Google in some previous posts, so I’m happy to have some good – no, make that great – Google news from the recent IUC34 conference. Albeit a little late compared to the tweeting others have done :-)

Even though the internationalization community has made great progress towards more, and more uniform, locale data with the wide acceptance of the CLDR in recent years, we have been left with 2 big gaping holes: phone numbers and postal addresses. Up until now it has been practically impossible to implement correctly internationalized phone number and address formatting, parsing and validation, since the data and APIs have been unavailable.

Depending on the level of globalization awareness of the companies involved, this has resulted in implementations falling into one of these 3 broad categories :

  1. Address and phone number fields are hard coded to work for only one locale and will reject as invalid everything else. This usually takes the form of making every single field required, and doing validation of every single field, even on web sites where the address and/or phone number of the user is not actually important (such as purchases of non-restricted software delivered electronically, or web sites with required registration to enable targeted ads). This of course also results in such companies collecting an amazing amount of absolute garbage. For instance, if you make “Zip code” a required field and validate it against a list of US zip codes, then you end up with an amazing percentage of your users living in the 90210 area – simply because that is the one US zip code people living outside the US have gotten drilled into them via exposure to the TV show.
  2. Support for a limited number of countries/regions (limited by the number of regions you have the bandwidth to gather data and implement support for – with each company reinventing the wheel every time, for every country)
  3. No validation (provide the user with one , single address field, and assume that if the user wants you to be able to reach you at that address, he/she will fill it in with good data)

As described in the IUC34 session (by Shaopeng Jia and Lara Rennie), collecting reliable and complete data to fill these holes was a major task (it’s no coincidence that nobody has done it before…):

  • There is no single source of data
  • Supposedly reliable data (ITU and individual country postal/telephone unions) turns out to be unreliable (data not updated, or new schemes not implemented on time)
  • Formats differ widely between countries/regions
  • Some countries even lack clear structure
  • Some countries (e.g., UK) use many different formats
  • Some countries use different formats depending on language/script being used
    • Chinese/Japanese/Korean addresses – start with biggest unit (country) if using ideographic script, but with smallest unit (street) if using Latin script

I have looked at these issues a few times in the past, and each time the team decided that we didn’t really need this information (translation: there was no way in hell we were going to be able to get the manpower to gather the information and implement a way to process it). Since Google does in fact have a business model that makes it very important to be able to parse these elements and format them correctly for display (targeted ads and Android, to name a couple of cases), it makes sense that they bit the bullet.

They deserve a lot of kudos for also going ahead and open-sourcing both the data and the APIs that are the result of that major undertaking, however.

Check it out:

According to the IUC34 presentation, the phone number APIs will allow you to format and validate 184 regions, while they will parse all regions. And the address APIs provide detailed validation for 38 regions, with layout+basic validation for all regions.

Weight of 1 web page: 8 micrograms October 21, 2010

Posted by globalizer in Unicode.
Tags:
add a comment

Who knew? The Web measures 8 feet by 8 feet by 20 feet, and it weighs 26000 pounds. Which means that each page weighs 8 micrograms.

At least, that’s the result when you pack a copy of the Web into a shipping container.

Just one of the fun things I learned in Brewster Kahle‘s fascinating keynote address to IUC34.

Go check out archive.org, which has the slightly ambitious goal of “Universal access to all knowledge”. As part of that effort they take a snapshot of every accessible web page every 2 months, so you can use their “waybackmachine” to see what web sites looked like in the past.

But that’s only a small part of the effort, they also scan and digitize books, archive audio, images, videos, etc.

Great resource, and great keynote #IUC34.

 

Follow

Get every new post delivered to your Inbox.