如何在Java中转换重音字符

时间:2016-04-14 14:44:51

标签: java regex string reflection normalization

我正在使用Java 1.5,我需要规范化String (像这样àèìòù ---> aeiou)。我不能使用Normalizer因为是> 1.6 有什么想法吗?

我试过这个:

public String normalizeText(String text) {
    text = normalizer(text);
    text = text.replaceAll("\\p{InCombiningDiacriticalMarks}]", "");
    return text;
}

public static String normalizer(String word) {
    try {
        int i;
        Class<?> normalizerClass = Class.forName("java.text.Normalizer");
        Class<?> normalizerFormClass = null;
        Class<?>[] nestedClasses = normalizerClass.getDeclaredClasses();
        for (i = 0; i < nestedClasses.length; i++) {
            Class<?> nestedClass = nestedClasses[i];
            if (nestedClass.getName().equals("java.text.Normalizer$Form")) {
                normalizerFormClass = nestedClass;
            }
        }
        assert normalizerFormClass.isEnum();
        Method methodNormalize = normalizerClass.getDeclaredMethod(
                "normalize",
                CharSequence.class,
                normalizerFormClass);
        Object nfcNormalization = null;
        Object[] constants = normalizerFormClass.getEnumConstants();
        for (i = 0; i < constants.length; i++) {
            Object constant = constants[i];
            if (constant.toString().equals("NFC")) {
                nfcNormalization = constant;
            }
        }
        return (String) methodNormalize.invoke(null, word, nfcNormalization);
    } catch (Exception ex) { return null; }
}

1 个答案:

答案 0 :(得分:1)

制作自己的方法

如果你不能使用Normaliser,那么使用Map也会有一个很好的方法,你可以将所有可能的字母变体标准化。

HashMap<Character, Character> rep = new HashMap<>();
rep.put("à","a");
rep.put("è","e");
rep.put("ì","i");
rep.put("ò","o");
rep.put("ù","u");
// etc...

这很长很糟糕,所以从文本文件加载会更好。

已有答案

在此page我找到了以下answer。它有效,我测试了它:

unicode表的镜像,从00c0到017f没有变音符号。

private static final String tab00c0 = "AAAAAAACEEEEIIII" +
    "DNOOOOO\u00d7\u00d8UUUUYI\u00df" +
    "aaaaaaaceeeeiiii" +
    "\u00f0nooooo\u00f7\u00f8uuuuy\u00fey" +
    "AaAaAaCcCcCcCcDd" +
    "DdEeEeEeEeEeGgGg" +
    "GgGgHhHhIiIiIiIi" +
    "IiJjJjKkkLlLlLlL" +
    "lLlNnNnNnnNnOoOo" +
    "OoOoRrRrRrSsSsSs" +
    "SsTtTtTtUuUuUuUu" +
    "UuUuWwYyYZzZzZzF";

返回没有变音符号的字符串 - 7位近似值。

public static String removeDiacritic(String source) {
    char[] vysl = new char[source.length()];
    char one;
    for (int i = 0; i < source.length(); i++) {
        one = source.charAt(i);
        if (one >= '\u00c0' && one <= '\u017f') {
            one = tab00c0.charAt((int) one - '\u00c0');
        }
        vysl[i] = one;
    }
    return new String(vysl);
}