我正在使用Java 1.5,我需要规范化String
(像这样àèìòù
---> aeiou
)。我不能使用Normalizer因为是> 1.6
有什么想法吗?
我试过这个:
public String normalizeText(String text) {
text = normalizer(text);
text = text.replaceAll("\\p{InCombiningDiacriticalMarks}]", "");
return text;
}
public static String normalizer(String word) {
try {
int i;
Class<?> normalizerClass = Class.forName("java.text.Normalizer");
Class<?> normalizerFormClass = null;
Class<?>[] nestedClasses = normalizerClass.getDeclaredClasses();
for (i = 0; i < nestedClasses.length; i++) {
Class<?> nestedClass = nestedClasses[i];
if (nestedClass.getName().equals("java.text.Normalizer$Form")) {
normalizerFormClass = nestedClass;
}
}
assert normalizerFormClass.isEnum();
Method methodNormalize = normalizerClass.getDeclaredMethod(
"normalize",
CharSequence.class,
normalizerFormClass);
Object nfcNormalization = null;
Object[] constants = normalizerFormClass.getEnumConstants();
for (i = 0; i < constants.length; i++) {
Object constant = constants[i];
if (constant.toString().equals("NFC")) {
nfcNormalization = constant;
}
}
return (String) methodNormalize.invoke(null, word, nfcNormalization);
} catch (Exception ex) { return null; }
}
答案 0 :(得分:1)
制作自己的方法
如果你不能使用Normaliser
,那么使用Map
也会有一个很好的方法,你可以将所有可能的字母变体标准化。
HashMap<Character, Character> rep = new HashMap<>();
rep.put("à","a");
rep.put("è","e");
rep.put("ì","i");
rep.put("ò","o");
rep.put("ù","u");
// etc...
这很长很糟糕,所以从文本文件加载会更好。
已有答案
unicode表的镜像,从00c0到017f没有变音符号。
private static final String tab00c0 = "AAAAAAACEEEEIIII" +
"DNOOOOO\u00d7\u00d8UUUUYI\u00df" +
"aaaaaaaceeeeiiii" +
"\u00f0nooooo\u00f7\u00f8uuuuy\u00fey" +
"AaAaAaCcCcCcCcDd" +
"DdEeEeEeEeEeGgGg" +
"GgGgHhHhIiIiIiIi" +
"IiJjJjKkkLlLlLlL" +
"lLlNnNnNnnNnOoOo" +
"OoOoRrRrRrSsSsSs" +
"SsTtTtTtUuUuUuUu" +
"UuUuWwYyYZzZzZzF";
返回没有变音符号的字符串 - 7位近似值。
public static String removeDiacritic(String source) {
char[] vysl = new char[source.length()];
char one;
for (int i = 0; i < source.length(); i++) {
one = source.charAt(i);
if (one >= '\u00c0' && one <= '\u017f') {
one = tab00c0.charAt((int) one - '\u00c0');
}
vysl[i] = one;
}
return new String(vysl);
}