jump to navigation

A plug for ICU4J January 9, 2007

Posted by globalizer in International requirements, Java, Unicode.

It doesn’t fail to amaze me how many Java developers either don’t know about ICU4J at all, or don’t realize what kinds of goodies it contains.

Just yesterday somebody asked how to prevent the entry of “double-byte/multibyte characters” in a J2EE application on Sun’s Java i18n forum. Now, the terminology used here is of course unfortunate (in Unicode, no character is more or less “double-bytish” than any other character, and this is a Java forum after all), and I still suspect that the OP may not completely understand the terminology involved.

In any case, as shown with the small snippet of code I posted there (since a pointer to the relevant ICU4J documentation did not seem to help), the UnicodeSet API allows you to check for a huge number of character properties:

String newname = request.getParameter("name");
int length = newname.length();
out.println ("<p>" + "Length of name entry: " + length);
UnicodeSet EASet = new UnicodeSet("[:ea=Wide:]");
out.println ("<p>" + "Name entry not contained in East Asian Wide Set? " + EASet.containsNone(newname));

With the power of regular expressions and Unicode character properties you should be able to test for almost anything.


No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: