jump to navigation

Kudos to Google: filling huge i18n gap October 25, 2010

Posted by globalizer in Android, Internationalization, Java, Locales.
trackback

I’ve been a little harsh on Google in some previous posts, so I’m happy to have some good – no, make that great – Google news from the recent IUC34 conference. Albeit a little late compared to the tweeting others have done 🙂

Even though the internationalization community has made great progress towards more, and more uniform, locale data with the wide acceptance of the CLDR in recent years, we have been left with 2 big gaping holes: phone numbers and postal addresses. Up until now it has been practically impossible to implement correctly internationalized phone number and address formatting, parsing and validation, since the data and APIs have been unavailable.

Depending on the level of globalization awareness of the companies involved, this has resulted in implementations falling into one of these 3 broad categories :

  1. Address and phone number fields are hard coded to work for only one locale and will reject as invalid everything else. This usually takes the form of making every single field required, and doing validation of every single field, even on web sites where the address and/or phone number of the user is not actually important (such as purchases of non-restricted software delivered electronically, or web sites with required registration to enable targeted ads). This of course also results in such companies collecting an amazing amount of absolute garbage. For instance, if you make “Zip code” a required field and validate it against a list of US zip codes, then you end up with an amazing percentage of your users living in the 90210 area – simply because that is the one US zip code people living outside the US have gotten drilled into them via exposure to the TV show.
  2. Support for a limited number of countries/regions (limited by the number of regions you have the bandwidth to gather data and implement support for – with each company reinventing the wheel every time, for every country)
  3. No validation (provide the user with one , single address field, and assume that if the user wants you to be able to reach you at that address, he/she will fill it in with good data)

As described in the IUC34 session (by Shaopeng Jia and Lara Rennie), collecting reliable and complete data to fill these holes was a major task (it’s no coincidence that nobody has done it before…):

  • There is no single source of data
  • Supposedly reliable data (ITU and individual country postal/telephone unions) turns out to be unreliable (data not updated, or new schemes not implemented on time)
  • Formats differ widely between countries/regions
  • Some countries even lack clear structure
  • Some countries (e.g., UK) use many different formats
  • Some countries use different formats depending on language/script being used
    • Chinese/Japanese/Korean addresses – start with biggest unit (country) if using ideographic script, but with smallest unit (street) if using Latin script

I have looked at these issues a few times in the past, and each time the team decided that we didn’t really need this information (translation: there was no way in hell we were going to be able to get the manpower to gather the information and implement a way to process it). Since Google does in fact have a business model that makes it very important to be able to parse these elements and format them correctly for display (targeted ads and Android, to name a couple of cases), it makes sense that they bit the bullet.

They deserve a lot of kudos for also going ahead and open-sourcing both the data and the APIs that are the result of that major undertaking, however.

Check it out:

According to the IUC34 presentation, the phone number APIs will allow you to format and validate 184 regions, while they will parse all regions. And the address APIs provide detailed validation for 38 regions, with layout+basic validation for all regions.

Comments»

1. g1smd - January 26, 2011

There’s a new version of libphonenumber in recent days.

New countries added and existing data updated.

2. globalizer - January 26, 2011

Excellent, thanks for spreading the news.


Leave a comment