Question

我从一个简单的测试中得到了一些意想不到的结果。运行以下内容后：

byte [] bytes = {(byte)0x40, (byte)0xE2, (byte)0x56, (byte)0xFF, (byte)0xAD, (byte)0xDC};
String s = new String(bytes, Charset.forName("UTF-8"));
byte[] bytes2 = s.getBytes(Charset.forName("UTF-8"));

bytes2是14个元素的长数组，与原始数组（字节）完全不同。有没有办法进行这种转换并将原始分解保留为字节？

Answer 1

有没有办法进行这种转换并将原始分解保留为字节？

那对我来说看起来不像是有效的UTF-8，所以我并不感到惊讶它没有往返。

如果要以可逆方式将任意二进制数据转换为文本，请使用base64，例如通过this public domain encoder/decoder。

Answer 2

这应该做：

public class Main
{

    /*
     * This method converts a String to an array of bytes
     */
    public void convertStringToByteArray()
    {

        String stringToConvert = "This String is 76 characters long and will be converted to an array of bytes";

        byte[] theByteArray = stringToConvert.getBytes();

        System.out.println(theByteArray.length);

    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args)
    {    
        new Main().convertStringToByteArray();
    }
}

Answer 3

两件事：

字节序列似乎不是有效的UTF-8

 $ python
 >>> '\x40\xe2\x56\xff\xad\xdc'.decode('utf8')
 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
     return codecs.utf_8_decode(input, errors, True)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 1: invalid continuation byte

即使它是有效的UTF-8，由于预组合字符和其他Unicode功能之类的东西，解码然后编码会导致不同的字节。

如果你想在一个字符串中对任意二进制数据进行编码，你可以保证在解码时得到相同的字节，你最好的选择就像base64。

Java：字符串到字节数组转换

3 个答案: