Question

我正在玩String及其构造函数，并注意到一些我无法解释的行为。

我创建了以下方法

public static String negate(String s) {
    byte[] b = s.getBytes();
    for (int i = 0; i < b.length; i++) {
        b[i] = (byte)(~b[i] + 1);
    }
    System.out.println(Arrays.toString(b));
    return new String(b);
}

在每个byte上只做一个2的补码，然后返回一个新的String。当它像

一样调用时

System.out.println(negate("Hello"));

我得到了

的输出

[-72, -101, -108, -108, -111]
�����

我猜这很好，因为没有负的ASCII值但当我嵌套这样的调用时

System.out.println(negate(negate("Hello")));

我的输出就像这样

[-72, -101, -108, -108, -111]
[17, 65, 67, 17, 65, 67, 17, 65, 67, 17, 65, 67, 17, 65, 67]
ACACACACAC // 5 groups of 3 characters (1 ctrl-char and "AC")

我希望输出与输入字符串"Hello"完全匹配，但我得到了这个。为什么？每个其他输入字符串也会发生这种情况。嵌套后，输入中的每个字符都变为AC。

我走得更远并创建了一个执行相同操作的方法，但只使用原始byte数组

public static byte[] n(byte[] b) {
    for (int i = 0; i < b.length; i++) {
        b[i] = (byte)(~b[i] + 1);
    }
    System.out.println(Arrays.toString(b));
    return b;
}

此处的输出符合预期。对于

System.out.println(new String(n(n("Hello".getBytes()))));

我得到了

[-72, -101, -108, -108, -111]
[72, 101, 108, 108, 111]
Hello

所以我想这与创建String的方式有关，因为只有当我用已经得到负negate s的实例调用byte时才会发生这种情况？

我甚至走下课堂树来查看内部课程，但我找不到这种行为的来源。

同样在String的文档中，以下段落可能是一个解释：

当指定字节在默认字符集中无效时，此构造函数的行为未指定

有人能告诉我为什么会这样，这到底发生了什么？

Answer 1

问题在于您是否接受了反转的字节并尝试将它们解释为默认字符集中的有效字节流（请记住，字符不是字节）。因此，您引用的字符串构造函数文档告诉您，结果未指定，可能涉及纠错，丢弃无效值等等。当然，它是一个有损的过程，并且反转它将无法获得你支持原来的字符串。

如果你得到字节并对它们进行双重否定而没有将中间字节转换为字符串，那么你将获得原始结果。

此示例演示了new String(/*invalid bytes*/)：

的有损性质

String s = "Hello";
byte[] b = s.getBytes();
for (int i = 0; i < b.length; i++) {
    b[i] = (byte)(~b[i] + 1);
}
// Show the negated bytes
System.out.println(Arrays.toString(b));
String s2 = new String(b);
// Show the bytes of the string constructed from them; note they're not the same
System.out.println(Arrays.toString(s2.getBytes()));

在我的系统上，我认为默认为UTF-8，我得到：

[-72, -101, -108, -108, -111]
[-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67]

注意当我获取无效字节流时，发生了一个字符串，然后得到该字符串的字节。

Answer 2

你“否定”一个角色，它变得无效。然后你得到占位符�（U + FFFD）。此时一切都已损坏。然后你“否定”那个，你从每个占位符字符中得到AC。

“否定”字符串会产生意外行为

2 个答案: