我想修改icu4j cyrillic to latin以保留空格。显而易见的是
@Test
public void test1() {
String greek
= "'E\u00E9 \u043c\u0430\u0442\u0435\u043c\u0430\u0442\u0438\u043a\u0430";
String id1 = "Any-Latin; NFD; [^\\p{Alnum} ] Remove";
String id2 = "Any-Latin; NFD";
String latin1 = com.ibm.icu.text.Transliterator.getInstance(id1)
.transform(greek);
Assert.assertEquals("Ee matematika", latin1);
}
但是失败了(使用ICU4J 54.1.1):
junit.framework.ComparisonFailure: expected:<Ee[ ]matematika> but was:<Ee[]matematika>">junit.framework.ComparisonFailure: expected:<Ee[ ]matematika> but was:<Ee[]matematika> at junit.framework.Assert.assertEquals
我可以使用相同的正则表达式在Java代码中replaceAll
并且它确实有效:
@Test
public void test2() {
String greek
= "'E\u00E9 \u043c\u0430\u0442\u0435\u043c\u0430\u0442\u0438\u043a\u0430";
String id1 = "Any-Latin; NFD; [^\\p{Alnum} ] Remove";
String id2 = "Any-Latin; NFD";
String latin1 = com.ibm.icu.text.Transliterator.getInstance(id1)
.transform(greek);
Assert.assertEquals("Eematematika", latin1); // why not "Ee matematika"?
String latin2 = com.ibm.icu.text.Transliterator.getInstance(id2)
.transform(greek).replaceAll("[^\\p{Alnum} ]", "");
Assert.assertEquals("Ee matematika", latin2);
}
并将音译器ID中的空格替换为\\x20
。这只是ICU4J中的一个错误还是以某种方式预期的?
答案 0 :(得分:0)
toString()
ReplaceableString输出的transform()
可能是:
public String transform(String source) {
return transliterate(source);
}
...
public final String transliterate(String text) {
ReplaceableString result = new ReplaceableString(text);
transliterate(result);
return result.toString();
}
尝试将您获得的字符串转换为UTF16代码点并检查是否存在差异。