我正在尝试将UTF8字符串编码并解码为base64。 理论上不是问题,但解码时似乎永远不会输出正确的字符而是?。
String original = "خهعسيبنتا";
B64encoder benco = new B64encoder();
String enc = benco.encode(original);
try
{
String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Original: " + original);
prtHx("ara", original.getBytes());
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes());
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes());
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
控制台的输出如下:
原文:خهعسيبنتا
ara = 3F,3F,3F,3F,3F,3F,3F,3F,3F
编码:Pz8 / Pz8 / Pz8 /
enc = 50,7A,38,2F,50,7A,38,2F,50,7A,38,2F
解码:?????????
dec = 3F,3F,3F,3F,3F,3F,3F,3F,3F
prtHx只是将字节的十六进制值写入输出。 我在这里做了一些明显不对的事吗?
Andreas指出了正确的解决方案,突出显示getBytes()方法使用平台默认编码(Cp1252),即使源文件本身是UTF-8。通过使用getBytes(“UTF-8”),我能够注意到编码和解码的字节实际上是不同的。 进一步调查表明,编码方法使用了getBytes()。改变这种做法很有效。
try
{
String enc = benco.encode(original);
String dec = new String(benco.decode(enc.toCharArray()), "UTF-8");
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println("Original: " + original);
prtHx("ori", original.getBytes("UTF-8"));
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes("UTF-8"));
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes("UTF-8"));
} catch (UnsupportedEncodingException e)
{
e.printStackTrace();
}
系统编码Cp1252
原文:خهعسيبنتا
ori = D8,AE,D9,87,D8,B9,D8,B3,D9,8A,D8,A8,D9,86,D8,AA,D8,A7
编码:2K7Zh9i52LPZitio2YbYqtin
enc = 32,4B,37,5A,68,39,69,35,32,4C,50,5A,69,74,69,6F,32,59,62,59,71,74,69 ,6E
解码:خهعسيبنتا
dec = D8,AE,D9,87,D8,B9,D8,B3,D9,8A,D8,A8,D9,86,D8,AA,D8,A7
感谢。
答案 0 :(得分:6)
String#getBytes()
使用平台的默认字符集对字符进行编码。字符串文字"خهعسيبنتا"
的实际编码在java源文件中“已定义”(您在创建或保存文件时选择字符编码)
这可能是为什么ara
编码为0x3f
字节的原因..
尝试一下:
out.println("Original: " + original);
prtHx("ara", original.getBytes("UTF-8"));
out.println("Encoded: " + enc);
prtHx("enc", enc.getBytes("UTF-8"));
out.println("Decoded: " + dec);
prtHx("dec", dec.getBytes("UTF-8"));