转换windows-1252 Java

时间:2016-02-05 11:28:53

标签: java character-encoding

我有一个带有此值的java字符串:

=C3=A1 =C3=A0 =C3=A7 =C3=A3 =C3=B5 =C3=A9 =C3=9A =C3=81 =C3=A2 =C3=A9 UHA a=C3==A7=C3=A3

我认为用windows-1252编码。我想将其转换为可读字符串。我尝试使用UTF-8转换,但它无法正常工作。有人可以帮帮我吗?

1 个答案:

答案 0 :(得分:2)

该字符串包含charcaters,编码为Quoted-Printable

部分=C3=A1是编码为UTF-8的á

显示解码的小片段。

String hexChars = "0123456789ABCDEF";
String s = "=C3=A1 =C3=A0 =C3=A7 =C3=A3 =C3=B5 =C3=A9 =C3=9A"
        + " =C3=81 =C3=A2 =C3=A9 UHA a=C3=A7=C3=A3";
int stringIndex = 0;
int bytesIndex = 0;
byte[] bytes = new byte[s.length()];
while (stringIndex < s.length()) {
    if (s.charAt(stringIndex) == '=' 
            && hexChars.indexOf(s.charAt(stringIndex+1)) >= 0
            && hexChars.indexOf(s.charAt(stringIndex+2)) >= 0
            ) {
        int hex = hexChars.indexOf(s.charAt(stringIndex+1));
        hex <<= 4;
        hex += hexChars.indexOf(s.charAt(stringIndex+2));
        bytes[bytesIndex] = (byte) hex;
        stringIndex += 2;
    } else {
        bytes[bytesIndex] = (byte) (s.charAt(stringIndex) & 0XFF);
    }
    stringIndex++;
    bytesIndex++;
}
System.out.println("bytes = " + new String(bytes, 0, bytesIndex, 
        StandardCharsets.UTF_8));

输出

bytes = á à ç ã õ é Ú Á â é UHA açã

该片段仅用于演示目的。查看一个可以为您解释quoted-printable的库。