Question

我目前正在开发一个应用程序，用户可以通过十六进制编辑器界面编辑ByteBuffer，也可以通过JTextPane编辑相应的文本。我当前的问题是因为JTextPane需要一个String我需要在显示值之前将ByteBuffer转换为String。但是，在转换期间，无效字符将由charsets默认替换字符替换。这会压缩无效值，因此当我将其转换回字节缓冲区时，无效字符值将替换为默认替换字符的字节值。有一种简单的方法可以保留字符串中无效字符的字节值吗？我已经阅读了以下stackoverflow帖子，但通常人们想要替换不可打印的字符，我需要保留它们。

Java ByteBuffer to String

Java: Converting String to and from ByteBuffer and associated problems

是否有一种简单的方法可以执行此操作，还是需要跟踪文本编辑器中发生的所有更改并将其应用于ByteBuffer？

以下是演示此问题的代码。代码使用byte []而不是ByteBuffer，但问题是相同的。

        byte[] temp = new byte[16];
        // 0x99 isn't a valid UTF-8 Character
        Arrays.fill(temp,(byte)0x99);

        System.out.println(Arrays.toString(temp));
        // Prints [-103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103]
        // -103 == 0x99

        System.out.println(new String(temp));
        // Prints ����������������
        // � is the default char replacement string

        // This takes the byte[], converts it to a string, converts it back to a byte[]
        System.out.println(Arrays.toString(new String(temp).getBytes()));
        // I need this to print [-103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103, -103]
        // However, it prints
        //[-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67]
        // The printed byte is the byte representation of �

Answer 1

您认为new String(temp).getBytes()会为您做什么？

我可以告诉你，它确实有些不好。

它使用默认编码将temp转换为String，这可能是错误的，并且可能会丢失信息。
它使用默认编码将结果转换回字节数组。

要将byte[]变为String，您必须始终将Charset传递给String构造函数，否则请直接使用解码器。由于您使用的是缓冲区，因此您可能会发现解码器API很合适。

要将String变为byte[]，您必须始终致电getBytes(Charset)，以便您知道自己正在使用正确的字符集。

根据评论，我现在怀疑你的问题是你需要编写类似下面的代码，以便从UI转换为十六进制。（然后相应的东西回来。）

String getHexString(byte[] bytes) {
    StringBuilder builder = new StringBuilder();
    for (byte b : bytes) {
       int nibble = b >> 4;
       builder.append('0' + nibble);
       nibble = b & 0xff;
       builder.append('0' + nibble);
    }
    return builder.toString();
}

Answer 2

特别是UTF-8会出错

    byte[] bytes = {'a', (byte) 0xfd, 'b', (byte) 0xe5, 'c'};
    String s = new String(bytes, StandardCharsets.UTF_8);
    System.out.println("s: " + s);

需要一个CharsetDecoder。可以忽略（=删除）或替换有问题的字节，或者默认情况下：抛出异常。

对于JTextPane，我们使用HTML，因此我们可以在<span>中编写违规字节的十六进制代码，为其提供红色背景。

    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
    CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
    CharBuffer charBuffer = CharBuffer.allocate(bytes.length * 50);
    charBuffer.append("<html>");
    for (;;) {
        try {
            CoderResult result = decoder.decode(byteBuffer, charBuffer, false);
            if (!result.isError()) {
                break;
            }
        } catch (RuntimeException ex) {
        }
        int b = 0xFF & byteBuffer.get();
        charBuffer.append(String.format(
            "<span style='background-color:red; font-weight:bold'> %02X </span>",
            b));
        decoder.reset();
    }
    charBuffer.rewind();
    String t = charBuffer.toString();
    System.out.println("t: " + t);

代码并不反映非常好的API，而是使用它。

在Java中转换ByteBuffer和String之间的问题

2 个答案: