Question

我有一个像"01030920316"这样的字符串。当我要将这个字符串转换为long然后转换为字节然后转换为java

的输出

output in java : Tag in bytes :  0, 0, 0, 0, 61, 114, -104, 124

当我得到这个输出时，我在C中做同样的事情

output in C : Tag in bytes : 124,152,114,61,0,0,0,0

在这里，我理解-104 and 152之间的区别，因为有符号和无符号，但为什么在java和C中最后为0。对于这种行为，当我的这个字节进入C程序端进行验证时，我遇到了问题。

请解释我发生问题的地方。

Java程序：

final byte[] tagBytes = ByteBuffer.allocate(8)
                .putLong(Long.parseLong("01030920316")).array();
System.out.println("Tag in bytes  >> " + Arrays.toString(tagBytes));

C程序：

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

/** To access long long values as a byte array*/
typedef union uInt64ToByte__
{
    uint64_t m_Value;
    unsigned char m_ByteArray[8];

}uInt64ToByte;

int main()
{
    uInt64ToByte longLongToByteArrayUnion;
    longLongToByteArrayUnion.m_Value = atoll("01030920316");
    printf("%d,%d,%d,%d,%d,%d,%d,%d",longLongToByteArrayUnion.m_ByteArray[0],longLongToByteArrayUnion.m_ByteArray[1],longLongToByteArrayUnion.m_ByteArray[2],longLongToByteArrayUnion.m_ByteArray[3],longLongToByteArrayUnion.m_ByteArray[4],longLongToByteArrayUnion.m_ByteArray[5],longLongToByteArrayUnion.m_ByteArray[6],longLongToByteArrayUnion.m_ByteArray[7]);
    return 0;
}

Answer 1

java中的输出：以字节为单位的标记：0,0,0,0,61,114，-104,124

Java的ByteBuffer默认是Big Endian，它的字节是有符号的，因此大于127的字节显示为负数。

C中的输出：以字节为单位的标签：124,152,114,61,0,0,0,0

C的数组使用本地字节endianess，它在x86 / x64系统上是小端。 unsigned char的范围为0到255。

用Java生成与C相同的输出

final byte[] tagBytes = ByteBuffer.allocate(8).order(ByteOrder.nativeOrder())
        .putLong(Long.parseLong("01030920316")).array();
int[] unsigned = new int[tagBytes.length];
for (int i = 0; i < tagBytes.length; i++)
    unsigned[i] = tagBytes[i] & 0xFF;
System.out.println("Tag in bytes  >> " + Arrays.toString(unsigned));

打印

Tag in bytes  >> [124, 152, 114, 61, 0, 0, 0, 0]

Answer 2

字符串只是以不同的方式存储在Java和C中。您必须记住，用C编写的应用程序是本机的，Java应用程序是在Java虚拟机中运行的。 Java字节代码与平台无关，这就是您的Java代码在所有操作系统/处理器体系结构上的行为相同的原因。另一方面，存储字符的顺序在C中可能不同（编辑：在不同的架构上）

Edit2：假设我们有一个数字109，即1101101二进制数。为什么？ 1 * 64 + 1 * 32 + 0 * 16 + 1 * 8 + 1 * 4 + 0 * 2 + 1 * 1 = 109.最左边的位被称为“最重要”，因为它的重量是最大的（2 ^ 6 = 64）并且最右边的位被称为“最不重要”，因为它的权重是最小的（仅为1）。 109很无聊，因为它可以存储在一个字节中。让我们假设我们有更大的东西：1000这是00000011 11101000二进制。它存储在两个字节中（比方说X和Y）。现在我们可以将该数字保存为XY（big-endian）或YX（little-endian）。在big-endian中，第一个字节（具有最低地址）是最重要的字节。在little-endian中，第一个字节是最低有效字节。 x86是little-endian，JVM是big-endian。这就是输出不同的原因。

Answer 3

这是BigEndian and LittleEndian之间的区别。

当您将C ++上的数字转换为字节数组时，您会注意到底层系统是否为大端（首先存储多字节整数的最高字节）或小端（首先存储最低有效字节）。

但另一方面，Java总是使用big endian来隐藏底层系统的字节序。这是Java“一次编写 - 随处运行”理念的一部分。

Answer 4

C ++对其类型使用本机格式。 Java使用标准定义的格式，对应于原生格式 Sparc，但不同于PC。

通常，对于非字符类型，没有理由假设两个字节的转储是相同的不同的平台，即使它们包含相同的值。（根据平台的不同，它们甚至可能不一样尺寸。我知道C ++中的32,36,48和64位长;他们是 Java总是64位。）

Answer 5

首先，为什么顺序似乎相反：这是因为类putLong的{{1}}方法将字节放入big endian order的数组中。如果您希望以小端顺序排列，请在ByteBuffer上设置顺序：

ByteBuffer

其次，为什么你在Java中得到final byte[] tagBytes = ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN) .putLong(Long.parseLong("01030920316")).array();，你在C中得到-104：那是因为在C中你使用152，而在Java中类型unsigned char已签名，未签名。字节的内容实际上是相同的，但当您将其解释为有符号整数时，它显示为byte，当您将其解释为无符号整数时，它显示为-104。

Answer 6

由于Java的整数表示不是平台依赖的，所以在进行比较时我会把它作为参考，所以我更喜欢创建C代码，它考虑了C的整数表示的平台依赖性。

在此之后，我建议使用以下C代码按照OP：

创建字节打印输出

#define _BSD_SOURCE  

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

#if defined(__linux__)
#  include <endian.h>
#elif defined(__FreeBSD__) || defined(__NetBSD__)
#  include <sys/endian.h>
#elif defined(__OpenBSD__)
#  include <sys/types.h>
#  define be16toh(x) betoh16(x) /* -+ */
#  define be32toh(x) betoh32(x) /* -+--> not needed in this example */
#  define be64toh(x) betoh64(x) /* -+ */
#endif

int main()
{
  uint64_t uint64 = htobe64(atoll("01030920316")); /* convert to big endian/network byte order */

  for (int i = 0; i < sizeof(uint64); ++ i)
  {
    printf("%hhd, ", (signed char) (uint64 & 0xff));
    uint64 >>= 8;
  }

  printf("\n");

  return 0;
}

字符串到长转换在C和Java上有所不同，为什么？

6 个答案: