A cautionary tale about changing locale defaults November 16, 2007Posted by globalizer in Java, Locales.
Here I talked about some of the pitfalls involved in using short date formats in user interfaces, and I mentioned that it was occasioned by an issue involving a change in the Danish date format in Java between JDK 1.4 and 1.5.
The change has its origin in this bug report filed back in 2003 against Java 1.4. The original poster muddied the waters from the beginning by stating that The Danish Standard is, basically, the same as the international standard (yyyy-mm-dd). That is certainly a debatable statement – since the official Danish body providing guidance in this area (Dansk Sprognævn) at most states that you can choose between the international standard and the traditional Danish date format (day-month-year).
It is also worth noting that Dansk Sprognævn actually throws a good portion of mud in the waters themselves when they describe the usage of the “international standard”, since they state that you can write the year using 2 digits:
Det er også muligt at skrive årstallet med to cifre:
031002 el. 03-10-02.
This of course runs counter to the whole idea of the ISO 8601 standard – that it provides an unambiguous format for rendering dates.
Anyway, the original poster of the Java bug seems to have only asked for a change in the date format separators, not a change in the order from day-month-year to year-month-day:
The short and classic Danish short date format is 17/2-2001 or17/2-01. The format 17.02.2001 can also be used, however.
This request also contains a misunderstanding – the 17/2-2001 format being requested is explicitly described as being peculiar to handwritten material by Dansk Sprognævn (Datoer med skråstreg hører især hjemme i håndskrift). So a close reading of their statements would have led to the conclusion that the Danish date format is dd.mm.yyyy or dd.mm.yy, and that the only possible change to consider would be a change from hyphen to period as the separator.
Dansk Sprognævn actually specifically mentions the format with hyphens:
Også datoer med bindestreger, fx 2-10-03, kan bruges i håndskrift, men dårligt i maskinskrift.
Translation: Dates using hyphens, e.g. 2-10-03, may also be used in handwritten material, but are not well suited for typewriting.
I will get back to these statements later – as I think the wording indicates that these rules were written a loooong time ago, and might well be worth reconsidering.
In any case, the Sun developer handling this bug report actually started out verifying, very sensibly, what other platforms do:
As far as I can see from all my sources, Windows, Solaris and ICU all have the format as dd-MM-yyyy and dd-MM-yy.
If he had only left it at that, and rejected the bug, all would have been well. Unfortunately, that is obviously not what happened. Instead, somebody from Sun went to this URL and found a date format:
The bug refers to this as representing what is in Retskrivningsordbogen (the dictionary and grammar guide issued by Dansk Sprognævn), but that is actually not the case; the sort order/collation rules are taken from that publication, but the date format, number formats etc., are from “Dansk Standard” [*].
In any case, it’s important to note that the short date listed by Dansk Standard uses 4 digits to represent the year:
Numeric date: 1994-06-07
However, the change that Sun implemented is summarized in this diff:
< “dd-MM-yyyy”, // medium date pattern
< “dd-MM-yy”, // short date pattern
> “yyyy-MM-dd”, // medium date pattern
> “yy-MM-dd”, // short date pattern
There are at least 2 things that are really, really bad about this change:
- A short date pattern was created that is ambiguous – even though the only reason for possibly wanting to use a yyyy-mm-dd pattern is that it would be unambiguous. That is of course only true if you actually use the correct ISO 8601 pattern, not a 2-digit year.
- A Danish date format that has been used on practically all computer platforms for the last 20 years or more was changed in Java to something that will be completely confusing to all
Thus, in all Java applications that use a short date format and which are upgraded from Java 1.4 to Java 1.5, a date that used to display as 08-09-07 will suddenly display as 07-09-08.
Many meetings that were called for September 8, 2007 would have had people thinking that the invitation had been sent reaaally early – since they would think that it was called for September 7, 2008.
And, applications using Java now displays dates in a format that is different from all other major programming platforms.
All of this implemented on the basis of one bug report, from one individual – which didn’t actually ask for the change that was implemented.
So, what should be done about this sorry state of affairs?
For one thing, any applications using Sun’s JRE should immediately stop using the short date format (it is ambiguous in most cases, so it has never been a good idea).
I also think that Sun should change the Danish default back to what it was before 1.5 (possibly changing the short format to use 4 digits for the year). I think the change from 1.4 to 1.5 was ill-considered and a big mistake – probably obvious from everything I have said so far. A change like that has huge ramifications, and I just can’t believe that a similar change for the US date format would have been implemented with as little prior discussion and attempt to consider the consequences.
And, I think that Dansk Sprognævn should revise their statements concerning acceptable date formats. The quote above about date formats using hyphens as separators being unsuitable for “type written” material seems to clearly indicate that it was written a long time ago (the Danish term actually harkens back to typewriter terminology) – otherwise the wording would probably have been different. It also does not seem to take into account the practice of all major programming platforms over the last 20 years to use precisely a hyphen as the separator in Danish dates. And since they are clearly of the descriptivist school (as opposed to prescriptivism), I don’t see why they wouldn’t simply note that the format with the hyphen is now the most prevalent one.
It is actually somewhat difficult to say whether the standards of “Dansk Standard” or those of Dansk Sprognævn are the more “officially” sanctioned ones in this area. What can be said is that Dansk Sprognævn is not purely prescriptive – they take a more descriptive approach and try to state what common usage is in various areas. And I think it is also true to say that most Danes know and use Retskrivningsordbogen published by Dansk Sprognævn, while few if any Danes know about the rules of Dansk Standard in this area.