在Android中是否有任何方法(据我所知)没有java.text.Normalizer,从String中删除任何重音。例如,“éàù”变成“eau”。
如果可能的话,我想避免解析String来检查每个字符!
答案 0 :(得分:81)
java.text.Normalizer
(无论如何都是最新版本)。你可以使用它。
编辑以供参考,以下是Normalizer
的使用方法:
string = Normalizer.normalize(string, Normalizer.Form.NFD);
string = string.replaceAll("[^\\p{ASCII}]", "");
(粘贴自以下评论中的链接)
答案 1 :(得分:4)
我已经调整了Rabi的解决方案以满足我的需求,我希望它可以帮助某人:
private static Map<Character, Character> MAP_NORM;
public static String removeAccents(String value)
{
if (MAP_NORM == null || MAP_NORM.size() == 0)
{
MAP_NORM = new HashMap<Character, Character>();
MAP_NORM.put('À', 'A');
MAP_NORM.put('Á', 'A');
MAP_NORM.put('Â', 'A');
MAP_NORM.put('Ã', 'A');
MAP_NORM.put('Ä', 'A');
MAP_NORM.put('È', 'E');
MAP_NORM.put('É', 'E');
MAP_NORM.put('Ê', 'E');
MAP_NORM.put('Ë', 'E');
MAP_NORM.put('Í', 'I');
MAP_NORM.put('Ì', 'I');
MAP_NORM.put('Î', 'I');
MAP_NORM.put('Ï', 'I');
MAP_NORM.put('Ù', 'U');
MAP_NORM.put('Ú', 'U');
MAP_NORM.put('Û', 'U');
MAP_NORM.put('Ü', 'U');
MAP_NORM.put('Ò', 'O');
MAP_NORM.put('Ó', 'O');
MAP_NORM.put('Ô', 'O');
MAP_NORM.put('Õ', 'O');
MAP_NORM.put('Ö', 'O');
MAP_NORM.put('Ñ', 'N');
MAP_NORM.put('Ç', 'C');
MAP_NORM.put('ª', 'A');
MAP_NORM.put('º', 'O');
MAP_NORM.put('§', 'S');
MAP_NORM.put('³', '3');
MAP_NORM.put('²', '2');
MAP_NORM.put('¹', '1');
MAP_NORM.put('à', 'a');
MAP_NORM.put('á', 'a');
MAP_NORM.put('â', 'a');
MAP_NORM.put('ã', 'a');
MAP_NORM.put('ä', 'a');
MAP_NORM.put('è', 'e');
MAP_NORM.put('é', 'e');
MAP_NORM.put('ê', 'e');
MAP_NORM.put('ë', 'e');
MAP_NORM.put('í', 'i');
MAP_NORM.put('ì', 'i');
MAP_NORM.put('î', 'i');
MAP_NORM.put('ï', 'i');
MAP_NORM.put('ù', 'u');
MAP_NORM.put('ú', 'u');
MAP_NORM.put('û', 'u');
MAP_NORM.put('ü', 'u');
MAP_NORM.put('ò', 'o');
MAP_NORM.put('ó', 'o');
MAP_NORM.put('ô', 'o');
MAP_NORM.put('õ', 'o');
MAP_NORM.put('ö', 'o');
MAP_NORM.put('ñ', 'n');
MAP_NORM.put('ç', 'c');
}
if (value == null) {
return "";
}
StringBuilder sb = new StringBuilder(value);
for(int i = 0; i < value.length(); i++) {
Character c = MAP_NORM.get(sb.charAt(i));
if(c != null) {
sb.setCharAt(i, c.charValue());
}
}
return sb.toString();
}
答案 2 :(得分:3)
这可能不是最有效的解决方案,但它可以解决问题,并且适用于所有Android版本:
private static Map<Character, Character> MAP_NORM;
static { // Greek characters normalization
MAP_NORM = new HashMap<Character, Character>();
MAP_NORM.put('ά', 'α');
MAP_NORM.put('έ', 'ε');
MAP_NORM.put('ί', 'ι');
MAP_NORM.put('ό', 'ο');
MAP_NORM.put('ύ', 'υ');
MAP_NORM.put('ή', 'η');
MAP_NORM.put('ς', 'σ');
MAP_NORM.put('ώ', 'ω');
MAP_NORM.put('Ά', 'α');
MAP_NORM.put('Έ', 'ε');
MAP_NORM.put('Ί', 'ι');
MAP_NORM.put('Ό', 'ο');
MAP_NORM.put('Ύ', 'υ');
MAP_NORM.put('Ή', 'η');
MAP_NORM.put('Ώ', 'ω');
}
public static String removeAccents(String s) {
if (s == null) {
return null;
}
StringBuilder sb = new StringBuilder(s);
for(int i = 0; i < s.length(); i++) {
Character c = MAP_NORM.get(sb.charAt(i));
if(c != null) {
sb.setCharAt(i, c.charValue());
}
}
return sb.toString();
}
答案 3 :(得分:2)
虽然Guillaume的回答确实有效,但它会从字符串中删除所有非ASCII字符。如果您希望保留这些代码,请尝试使用此代码(其中string
是要简化的字符串):
// Convert input string to decomposed Unicode (NFD) so that the
// diacritical marks used in many European scripts (such as the
// "C WITH CIRCUMFLEX" → ĉ) become separate characters.
// Also use compatibility decomposition (K) so that characters,
// that have the exact same meaning as one or more other
// characters (such as "㎏" → "kg" or "ヒ" → "ヒ"), match when
// comparing them.
string = Normalizer.normalize(string, Normalizer.Form.NFKD);
StringBuilder result = new StringBuilder();
int offset = 0, strLen = string.length();
while(offset < strLen) {
int character = string.codePointAt(offset);
offset += Character.charCount(character);
// Only process characters that are not combining Unicode
// characters. This way all the decomposed diacritical marks
// (and some other not-that-important modifiers), that were
// part of the original string or produced by the NFKD
// normalizer above, disappear.
switch(Character.getType(character)) {
case Character.NON_SPACING_MARK:
case Character.COMBINING_SPACING_MARK:
// Some combining character found
break;
default:
result.appendCodePoint(Character.toLowerCase(character));
}
}
// Since we stripped all combining Unicode characters in the
// previous while-loop there should be no combining character
// remaining in the string and the composed and decomposed
// versions of the string should be equivalent. This also means
// we do not need to convert the string back to composed Unicode
// before returning it.
return result.toString();
答案 4 :(得分:0)
所有带重音的图表都在扩展的ASCII字符代码集中,十进制值大于127.因此,您可以枚举字符串中的所有字符,如果十进制字符代码值大于127,则将其映射回您想要的当量。没有简单的方法将重音字符映射回非重音符号 - 您必须在内存中保留某种映射,以将扩展的十进制代码映射回非重音字符。