Question

好的，我需要对整数列表进行一些位运算。列表可能很长（256-4096个整数）。要注意的是：我需要成组读取可变长度的位（例如，直到我读到末尾为止为7位。

我发现我有两个选择：

将每个整数转换为长度为8的字节。连接所有字节。然后遍历 n 个字节的块，其中n是可变位长和8位的最低公倍数。这样做直到我结束。
示例1：7位（可变）和8位（一个字节）= 56位= 8字节。
示例2：3位（可变）和8位（一个字节）= 24位= 3个字节。
跳过转换为字节。遍历列表，将新的临时整数左移64位，然后使用按位OR运算符将下一个整数 insert 插入其中。这将创建一个庞大的整数，我可以对其进行迭代并使用按位AND运算符提取可变长度位。我读到在Python中3个整数是无限的，所以不会发生溢出。

我需要采用斋戒的方法，因此我开始编写两个脚本来计时，但是注意到第二种方法返回的结果完全出乎意料。

在以下脚本块中，您找到四个示例。

示例1 显示了转换部分，如上面选项1中所述。 1a和1b之间的区别是，我反转了long_list，所以在开头有一个负整数。

示例2 显示了选项2所需的位移。2a和2b之间的差异与1a和1b之间的差异相同。奇怪的是2b的位表示就像1b（我期望的那样）。但是，整数不同于1b。

那为什么结果不同？

import numpy as np

long_list = [145249953336295681, -4503032244740276095]
long_list_reversed = [-4503032244740276095, 145249953336295681]

############ EXAMPLE 1a ###########

# The following code creates exactly what I am expecting:
trueresult = long_list[0].to_bytes(8, "big", signed=True) + long_list[1].to_bytes(8, "big", signed=True)
trueint = int.from_bytes(trueresult, "big")
truebitstring = np.binary_repr(int.from_bytes(trueresult, "big"), width=128)
print("Int", trueint)  # output as expected
print("Bits", truebitstring)  # output as expected
assert trueint & 0b11111111 == 0b10000001  # True as expected

############ EXAMPLE 1b ###########

# The following code creates exactly what I am expecting:
# The same as the above, but the integers are switched, so the first 64 bits appear last.
trueresult_reversed = long_list_reversed[0].to_bytes(8, "big", signed=True) + long_list_reversed[1].to_bytes(8, "big", signed=True)
trueint_reversed = int.from_bytes(trueresult_reversed, "big")
truebitstring_reversed = np.binary_repr(int.from_bytes(trueresult_reversed, "big"), width=128)
print("Int", trueint_reversed)  # output as expected
print("Bits", truebitstring_reversed)  # output as expected
assert trueint_reversed & 0b11111111 == 0b00000001  # True as expected

assert truebitstring == truebitstring_reversed[64:] + truebitstring_reversed[:64]  # True as expected

############ EXAMPLE 2a ###########

# The following code creates completely unexpected output. Should do the same as the first code block.
shiftint = long_list[0] << 64 | long_list[1]
shiftbitstring = np.binary_repr(shiftint, width=128)
print("Int", shiftint)  # output unexpected. should be same as 'trueint'
print("Bits", shiftbitstring)  # output unexpected, should be same as 'truebitstring'

############ EXAMPLE 2b ###########

# The following code creates completely unexpected output. Should do the same as the second code block.
# On top of that, it doesn't even compare to the third, like the second to the first (swapped integers).
shiftint_reversed = long_list_reversed[0] << 64 | long_list_reversed[1]
shiftbitstring_reversed = np.binary_repr(shiftint_reversed , width=128)
print("Int", shiftint_reversed)  # output both unexpected. should be same as 'trueint_reversed'
print("Bits", shiftbitstring_reversed)  # output expected, same as Example 1b! However, the integer above is NOT like in 1b!

这是脚本的输出：

Int 2679388715912901282319653733876646017
Bits 00000010000001000000100000010000001000000100000010000001000000011100000110000010000001000000100000010000001000000100000010000001
Int 257216083546552756177539452046611087617
Bits 11000001100000100000010000001000000100000010000001000000100000010000001000000100000010000001000000100000010000001000000100000001
Int -4503032244740276095
Bits 11111111111111111111111111111111111111111111111111111111111111111100000110000010000001000000100000010000001000000100000010000001
Int -83066283374385707285835155385157123839
Bits 11000001100000100000010000001000000100000010000001000000100000010000001000000100000010000001000000100000010000001000000100000001

如果我想用视觉表示我要通过位移实现的目标，它将看起来像这样。想象一下，在内存中以冒号作为整数的行。剩下的就是对内存的操作。

short_list = 00000010, 10000000

: 0
  << 8
: 00000000
  | 00000010
: 00000010
  << 8
: 00000010 00000000
  | 10000000
: 00000010 10000000

Answer 1

您的2a / 2b代码段无法确定他们认为Python int是64位还是任意精度。您正在将数字移位64位，好像OR in中的下一个数字正好是64位，但不是。

Python int模拟无限位二进制补码表示，在无限位二进制补码中，-6为

...11111111111111111111111111111111111111111111111111111111111111111111111111111111111010

，无穷无尽的前1个行迹向左移动。有点像2-adic integers。 1的无限轨迹是您在第三位字符串中看到的1 s大块的原因。

因此，同样，Python int在概念上是无限位的，但是用int.to_bytes和numpy.binary_repr获得的表示不是。这就是为什么这些函数采用宽度参数的原因。

some_int.to_bytes(8, 'big', signed=True)产生一个int的64位（8字节）二进制补码表示形式。由于您的to_bytes调用会产生一个64位的字节串，因此该字节串的连接会产生您期望的结果。

numpy.binary_repr(some_int, width=128)产生一个int的128位表示。对于负输入，它使用二进制补码，但是对于正输入，如果产生带有前导1的输出，即使该1会导致将输出视为负数补码。

位操作出乎意料的结果

1 个答案: