Question

我有一个大小为8的字节数组。我正在使用以下代码将其转换为字符串。（见下文）。

现在，当我使用getBytes方法将字符串再次转换为byte []时，结果是荒谬的，这是一个16位大小的byte []，与前一个字节数组只有几个（2或3）匹配字节。有人能告诉我哪里出错了吗？

Image

printBytes功能：

byte[] message = new byte[8];
//initialize message
printBytes("message: " + message.length + " = ", message);
try {
    String test = new String(message, "utf-8");
    System.out.println(test);
    byte[] f = test.getBytes("utf-8");
    Help.printBytes("test = " + f.length, f);
} catch (UnsupportedEncodingException e1) {
    // TODO Auto-generated catch block
    e1.printStackTrace();
}

输出：

public static void printBytes(String msg, byte[] b){
    System.out.print(msg + " = ");
    for(int i = 0; i < b.length; i++){
        System.out.print("" + String.format("%02X", b[i]));
    }
    System.out.println("\n");
}

Answer 1

您的原始byte[]具有非法字节序列（即，不形成有效UTF-8字符的序列）。这有String(byte[], String)构造函数的未指定行为，但在您的实现中，这些坏字节被“ ”字符替换，这是\uFFFD - UTF-8中的三字节字符。你似乎有四个，在那里占12个字节。

Answer 2

new String(message, "utf-8");

此代码告诉字符串对象，您的消息utf-8编码为。

test.getBytes("utf-8");

这段代码意味着，给我字符串的字节并编码为utf-8编码的字符串。结果是，你的字符串将是双utf-8编码。

只做一次代码。

String test = new String(message, "utf-8");
test.getBytes();

双重编码字符串的示例：

public class Test {

    public static void main(String[] args) {
        try {
            String message = "äöü";
            Test.printBytes("java internal encoded: = ", message.getBytes());
            Test.printBytes("utf-8 encoded: = ", message.getBytes("utf-8"));
            // get the string utf-8 encoded and create a new string with the
            // utf-8 encoded content
            message = new String(message.getBytes("utf-8"), "utf-8");
            Test.printBytes("test get bytes without charset: = ", message.getBytes());
            Test.printBytes("test get bytes with charset: = ", message.getBytes("utf-8"));
            System.out.println(message);
            System.out.println("double encoded: " + new String(message.getBytes("utf-8")));
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

    public static void printBytes(String msg, byte[] b) {
        System.out.print(msg + " = ");
        for (int i = 0; i < b.length; i++) {
            System.out.print("" + String.format("%02X", b[i]));
        }
        System.out.println("\n");
    }

}

输出继电器：

java internal encoded: =  = E4F6FC
utf-8 encoded: =  = C3A4C3B6C3BC
test get bytes without charset: =  = E4F6FC
test get bytes with charset: =  = C3A4C3B6C3BC

äöü
double encoded: Ã¤Ã¶Ã¼ <-- the java internal encoding is not converted to utf-8, it is double encoded

java，String.getBytes方法的问题

2 个答案: