I used to think that sorting things was easy. Collation is a really difficult problem, especially once you start considering different script (Latin, Chinese, German, etc.) and numeral systems (Western Arabic, Hindi, Japanese, etc.) in the same list, not to mention locale-specific sorting irregularities like German Phonebook sorts.
The problem of sorting country names is particularly sensitive. When you want to display China as 中国 to Chinese speakers, where should it be sorted compared to Canada or Kâmpŭchea (Cambodia)?
Here we have an example list of countries, in the order I looked them up online, heh. For the sake of not messing with my blog, I avoided Right-to-Left country names for Egypt, Iran, or Israel.
- United States
- España
- 中国
- Deutschland
- Polska
- Россия
- भारत
If you had a “sort” feature in whatever software you’re using, hopefully it’s using the Unicode collation order to sort the names. You typically would get something like this as a result:
- Deutschland
- España
- Polska
- United States
- Россия
- भारत
- 中国
Business Case Sort
Alpha sorting, Unicode or otherwise, may seem pretty arbitrary especially if 95% of your customers come from three or four countries. One can always make the case for sorting country lists with the most popular countries dominating the “top 5″ or so of the list *. For many businesses this may mean a sort order of:
- United States
- 中国
- Россия
- Deutschland
- España
- Polska
- भारत
That’s all good … if you want to confuse Indian and Polish visitors, don’t care about keyboard users, and want to take a big hit on your Russian branding.
* Instead of mucking with collation, a usable solution is to autodetect what country people are from and pre-selecting things in dropdown lists, or highlighting it as a choice outside of the sorted list.
ISO to the Rescue
In my research, I’ve found a pretty good general solution, irrespective of the business case, is to sort things according to the ISO 3166-1-2 code. I know, I know, it’s lame and old and under fire constantly .. but it’s a fairly standard coding that technical people understand, native speakers understand, keyboard access is alright, and it’s considered safe on the culture-war front (other than being based on the Latin alphabet).
Our example above would be:
- 中国 (cn)
- Deutschland (de)
- España (es)
- भारत (in)
- Polska (pl)
- Россия (ru)
- United States (us)
Anywho, that’s just my suggestion for a starting point. Your business case may indeed support other sort orders for countries. But this one is reproducible and defensible, so that makes it good for programmers and business analysts alike.