java,String.getBytes方法的问题

时间:2015-09-23 13:30:27

标签: java

我有一个大小为8的字节数组。 我正在使用以下代码将其转换为字符串。 (见下文)。

现在,当我使用getBytes方法将字符串再次转换为byte []时,结果是荒谬的,这是一个16位大小的byte [],与前一个字节数组只有几个(2或3)匹配字节。有人能告诉我哪里出错了吗?

Image

printBytes功能:

byte[] message = new byte[8];
//initialize message
printBytes("message: " + message.length + " = ", message);
try {
    String test = new String(message, "utf-8");
    System.out.println(test);
    byte[] f = test.getBytes("utf-8");
    Help.printBytes("test = " + f.length, f);
} catch (UnsupportedEncodingException e1) {
    // TODO Auto-generated catch block
    e1.printStackTrace();
}

输出:

public static void printBytes(String msg, byte[] b){
    System.out.print(msg + " = ");
    for(int i = 0; i < b.length; i++){
        System.out.print("" + String.format("%02X", b[i]));
    }
    System.out.println("\n");
}

2 个答案:

答案 0 :(得分:6)

您的原始byte[]具有非法字节序列(即,不形成有效UTF-8字符的序列)。这有String(byte[], String)构造函数的未指定行为,但在您的实现中,这些坏字节被“ ”字符替换,这是\uFFFD - UTF-8中的三字节字符。你似乎有四个,在那里占12个字节。

答案 1 :(得分:-1)

new String(message, "utf-8");

此代码告诉字符串对象,您的消息utf-8编码为。

test.getBytes("utf-8");

这段代码意味着,给我字符串的字节并编码为utf-8编码的字符串。结果是,你的字符串将是双utf-8编码。

只做一次代码。

String test = new String(message, "utf-8");
test.getBytes();

双重编码字符串的示例:

public class Test {

    public static void main(String[] args) {
        try {
            String message = "äöü";
            Test.printBytes("java internal encoded: = ", message.getBytes());
            Test.printBytes("utf-8 encoded: = ", message.getBytes("utf-8"));
            // get the string utf-8 encoded and create a new string with the
            // utf-8 encoded content
            message = new String(message.getBytes("utf-8"), "utf-8");
            Test.printBytes("test get bytes without charset: = ", message.getBytes());
            Test.printBytes("test get bytes with charset: = ", message.getBytes("utf-8"));
            System.out.println(message);
            System.out.println("double encoded: " + new String(message.getBytes("utf-8")));
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

    public static void printBytes(String msg, byte[] b) {
        System.out.print(msg + " = ");
        for (int i = 0; i < b.length; i++) {
            System.out.print("" + String.format("%02X", b[i]));
        }
        System.out.println("\n");
    }

}

输出继电器:

java internal encoded: =  = E4F6FC
utf-8 encoded: =  = C3A4C3B6C3BC
test get bytes without charset: =  = E4F6FC
test get bytes with charset: =  = C3A4C3B6C3BC

äöü
double encoded: äöü <-- the java internal encoding is not converted to utf-8, it is double encoded