我正在尝试对用马其顿字母书写的一组字符串进行排序。我知道怎么做,但最终结果不是我的预期。这是我的测试程序:
public class Main {
private static final char[] ALPHABET_ARRAY = {
'а', 'б', 'в', 'г', 'д', 'ѓ', 'е', 'ж', 'з', 'ѕ', 'и', 'ј', 'к', 'л', 'љ', 'м', 'н', 'њ', 'о', 'п', 'р', 'с', 'т', 'ќ', 'у', 'ф', 'х', 'ц', 'ч','џ', 'ш' };
public static void main(String[] args) {
Collator collator = Collator.getInstance(new Locale("mk", "MK"));
List<String> list = new LinkedList<>();
for (int i = 0; i < ALPHABET_ARRAY.length; i++) {
list.add("" + ALPHABET_ARRAY[i]);
}
list.sort(collator::compare);
list.forEach(System.out::print);
}
}
ALPHABET_ARRAY
中的字母按正确的字母顺序排列,但程序会打印
абвгѓдежзѕијкќлљмнњопрстуфхцчџш
但应该是:
абвгдѓежзѕијклљмнњопрстќуфхцчџш
Java中的马其顿整理者是否存在问题,或者我做错了什么?
答案 0 :(得分:4)
“mk_MK”区域设置的整理程序基于sun.text.resources.mk.CollationData_mk
资源(CollationData_mk.java source in jdk8u repo tagged jdk8u92-b14)。
CollationData_mk
中的整理规则明确地将'ѓ'放在'г'之后,'ќ'放在'к'之后。
由于可以使用自定义规则创建RuleBasedCollator
,因此获取所需排序顺序的最简单方法是从CollationData_mk
稍微修改规则:
public static Collator createMacedonianCollator() throws ParseException {
// the defaults are defined in non-public sun.util.locale.provider.CollationRules
// they are used internally in sun.util.locale.provider.CollatorProviderImpl
// we have no direct access to proper defaults, so we will simply comment entries which depend on them
String DEFAULTRULES = "";
// we will move the entries for ѓ and ќ only, leaving everything else as is
return new RuleBasedCollator( DEFAULTRULES +
//"& 9 < \u0482 " + // thousand sign
//"& Z " + // Arabic script sorts after Z's
"< \u0430 , \u0410" + // a
"< \u0431 , \u0411" + // be
"< \u0432 , \u0412" + // ve
"< \u0433 , \u0413" + // ghe
"; \u0491 , \u0490" + // ghe-upturn
"; \u0495 , \u0494" + // ghe-mid-hook
/*!!!moved after д/de!!!*/ //"; \u0453 , \u0403" + // gje
"; \u0493 , \u0492" + // ghe-stroke
"< \u0434 , \u0414" + // de
/*!!!moved AND relation strength changed!!!*/ "< \u0453 , \u0403" + // gje
"< \u0452 , \u0402" + // dje
"< \u0435 , \u0415" + // ie
"; \u04bd , \u04bc" + // che
"; \u0451 , \u0401" + // io
"; \u04bf , \u04be" + // che-descender
"< \u0454 , \u0404" + // uk ie
"< \u0436 , \u0416" + // zhe
"; \u0497 , \u0496" + // zhe-descender
"; \u04c2 , \u04c1" + // zhe-breve
"< \u0437 , \u0417" + // ze
"; \u0499 , \u0498" + // zh-descender
"< \u0455 , \u0405" + // dze
"< \u0438 , \u0418" + // i
"< \u0456 , \u0406" + // uk/bg i
"; \u04c0 " + // palochka
"< \u0457 , \u0407" + // uk yi
"< \u0439 , \u0419" + // short i
"< \u0458 , \u0408" + // je
"< \u043a , \u041a" + // ka
"; \u049f , \u049e" + // ka-stroke
"; \u04c4 , \u04c3" + // ka-hook
"; \u049d , \u049c" + // ka-vt-stroke
"; \u04a1 , \u04a0" + // bashkir-ka
/*!!!moved after т/te!!!*/ //"; \u045c , \u040c" + // kje
"; \u049b , \u049a" + // ka-descender
"< \u043b , \u041b" + // el
"< \u0459 , \u0409" + // lje
"< \u043c , \u041c" + // em
"< \u043d , \u041d" + // en
"; \u0463 " + // yat
"; \u04a3 , \u04a2" + // en-descender
"; \u04a5 , \u04a4" + // en-ghe
"; \u04bb , \u04ba" + // shha
"; \u04c8 , \u04c7" + // en-hook
"< \u045a , \u040a" + // nje
"< \u043e , \u041e" + // o
"; \u04a9 , \u04a8" + // ha
"< \u043f , \u041f" + // pe
"; \u04a7 , \u04a6" + // pe-mid-hook
"< \u0440 , \u0420" + // er
"< \u0441 , \u0421" + // es
"; \u04ab , \u04aa" + // es-descender
"< \u0442 , \u0422" + // te
"; \u04ad , \u04ac" + // te-descender
"< \u045b , \u040b" + // tshe
/*!!!movedAND relation strength changed!!!*/ "< \u045c , \u040c" + // kje
"< \u0443 , \u0423" + // u
"; \u04af , \u04ae" + // straight u
"< \u045e , \u040e" + // short u
"< \u04b1 , \u04b0" + // straight u-stroke
"< \u0444 , \u0424" + // ef
"< \u0445 , \u0425" + // ha
"; \u04b3 , \u04b2" + // ha-descender
"< \u0446 , \u0426" + // tse
"; \u04b5 , \u04b4" + // te tse
"< \u0447 , \u0427" + // che
"; \u04b7 ; \u04b6" + // che-descender
"; \u04b9 , \u04b8" + // che-vt-stroke
"; \u04cc , \u04cb" + // che
"< \u045f , \u040f" + // dzhe
"< \u0448 , \u0428" + // sha
"< \u0449 , \u0429" + // shcha
"< \u044a , \u042a" + // hard sign
"< \u044b , \u042b" + // yeru
"< \u044c , \u042c" + // soft sign
"< \u044d , \u042d" + // e
"< \u044e , \u042e" + // yu
"< \u044f , \u042f" + // ya
"< \u0461 , \u0460" + // omega
"< \u0462 " + // yat
"< \u0465 , \u0464" + // iotified e
"< \u0467 , \u0466" + // little yus
"< \u0469 , \u0468" + // iotified little yus
"< \u046b , \u046a" + // big yus
"< \u046d , \u046c" + // iotified big yus
"< \u046f , \u046e" + // ksi
"< \u0471 , \u0470" + // psi
"< \u0473 , \u0472" + // fita
"< \u0475 , \u0474" + // izhitsa
"; \u0477 , \u0476" + // izhitsa-double-grave
"< \u0479 , \u0478" + // uk
"< \u047b , \u047a" + // round omega
"< \u047d , \u047c" + // omega-titlo
"< \u047f , \u047e" + // ot
"< \u0481 , \u0480" // koppa
);
}
规则可以进一步简化,只包含没有重音变体的基本31个字母。