我有一个大小为8的字节数组。 我正在使用以下代码将其转换为字符串。 (见下文)。
现在,当我使用getBytes方法将字符串再次转换为byte []时,结果是荒谬的,这是一个16位大小的byte [],与前一个字节数组只有几个(2或3)匹配字节。有人能告诉我哪里出错了吗?
Image
printBytes功能:
byte[] message = new byte[8];
//initialize message
printBytes("message: " + message.length + " = ", message);
try {
String test = new String(message, "utf-8");
System.out.println(test);
byte[] f = test.getBytes("utf-8");
Help.printBytes("test = " + f.length, f);
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
输出:
public static void printBytes(String msg, byte[] b){
System.out.print(msg + " = ");
for(int i = 0; i < b.length; i++){
System.out.print("" + String.format("%02X", b[i]));
}
System.out.println("\n");
}
答案 0 :(得分:6)
您的原始byte[]
具有非法字节序列(即,不形成有效UTF-8字符的序列)。这有String(byte[], String)
构造函数的未指定行为,但在您的实现中,这些坏字节被“ ”字符替换,这是\uFFFD
- UTF-8中的三字节字符。你似乎有四个,在那里占12个字节。
答案 1 :(得分:-1)
new String(message, "utf-8");
此代码告诉字符串对象,您的消息utf-8编码为。
test.getBytes("utf-8");
这段代码意味着,给我字符串的字节并编码为utf-8编码的字符串。结果是,你的字符串将是双utf-8编码。
只做一次代码。
String test = new String(message, "utf-8");
test.getBytes();
双重编码字符串的示例:
public class Test {
public static void main(String[] args) {
try {
String message = "äöü";
Test.printBytes("java internal encoded: = ", message.getBytes());
Test.printBytes("utf-8 encoded: = ", message.getBytes("utf-8"));
// get the string utf-8 encoded and create a new string with the
// utf-8 encoded content
message = new String(message.getBytes("utf-8"), "utf-8");
Test.printBytes("test get bytes without charset: = ", message.getBytes());
Test.printBytes("test get bytes with charset: = ", message.getBytes("utf-8"));
System.out.println(message);
System.out.println("double encoded: " + new String(message.getBytes("utf-8")));
} catch (Exception e) {
e.printStackTrace();
}
}
public static void printBytes(String msg, byte[] b) {
System.out.print(msg + " = ");
for (int i = 0; i < b.length; i++) {
System.out.print("" + String.format("%02X", b[i]));
}
System.out.println("\n");
}
}
输出继电器:
java internal encoded: = = E4F6FC
utf-8 encoded: = = C3A4C3B6C3BC
test get bytes without charset: = = E4F6FC
test get bytes with charset: = = C3A4C3B6C3BC
äöü
double encoded: äöü <-- the java internal encoding is not converted to utf-8, it is double encoded