Question

我尝试将gmpy2.mpz转换为numpy布尔数组，但不能完全正确。（gmpy2：https://gmpy2.readthedocs.io）

import gmpy2
import numpy as np

x = gmpy2.mpz(int('1'*1000,2))

print("wrong conversion 1")
y = np.fromstring(gmpy2.to_binary(x), dtype=bool) # this is wrong
print(np.sum(y)) # this returns 127 instead of 1000

print("wrong conversion 2")
y = np.fromstring(gmpy2.to_binary(x), dtype=np.uint8)
print(y) # array([  1,   1, 255 ... 255], dtype=uint8)
y_bool = np.unpackbits(y)
slow_popcount = np.sum(y_bool, dtype=int)
print(slow_popcount) # 1002. should be 1000

print("Fudging an answer. This is wrong as well.")
y = np.fromstring(gmpy2.to_binary(x)[2:], dtype=np.uint8)
# is that slicing [2:] a slow operation?
y_bool = np.unpackbits(y)
print np.sum(y_bool, dtype=int) # 1000

更多测试：

np.fromstring(gmpy2.to_binary(gmpy2.mpz(int('1'*64,2))), dtype=np.uint8)
# array([  1,   1, 255, 255, 255, 255, 255, 255, 255, 255], dtype=uint8)
np.fromstring(gmpy2.to_binary(gmpy2.mpz(int('1'*65,2))), dtype=np.uint8)
# array([  1,   1, 255, 255, 255, 255, 255, 255, 255, 255,   1], dtype=uint8
np.fromstring(gmpy2.to_binary(gmpy2.mpz(int('1'*66,2))), dtype=np.uint8)
# array([  1,   1, 255, 255, 255, 255, 255, 255, 255, 255,   3], dtype=uint8)
np.fromstring(gmpy2.to_binary(gmpy2.mpz(int('1'*1024,2))), dtype=np.uint8)
# array([  1,   1, 255 ... 255], dtype=uint8)

顺便说一下，我实际上想要快速获取gmpy2.mpz的所有设置位的列表，数组或numpy数组索引。我尝试转换的实际4,777,000 gmpy2.mpz有760,000位，大约2,000位1.计算机上的gmp库是用intel icc编译的。

由于

Answer 1

有几种选择。函数gmpy2.bit_scan1(x, n)将返回设置为索引＆gt; = n的第一个位的索引。

>>> x = gmpy2.mpz(123456)
>>> bin(x)
'0b11110001001000000'
>>> n = 0
>>> while True:
...     n = gmpy2.bit_scan1(x, n)
...     if n is None:
...         break
...     print(n)
...     n = n + 1
... 
6
9
13
14
15
16

gmpy2还支持名为xmpz的整数类型。它是mpz类型的实验版本。主要区别在于xmpz类型是可变的 - 就地操作将直接修改值而不创建副本。这使得xmpz类型对位操作非常有用。例如，您可以使用切片表示法提取和修改位位置。

xmpz类型还支持名为iter_set，iter_clear和iter_bits的方法。

>>> x_str='1'*8+'01'
>>> x_int=gmpy2.xmpz(x_str, 2)
>>> list(x_int.iter_set())
[0, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(x_int.iter_clear())
[1]
>>> list(x_int.iter_bits())
[True, False, True, True, True, True, True, True, True, True]

我最初编写xmpz类型来评估优化就地操作的任何性能改进。位操纵看到了最大的好处。以下是Eratosthenes筛选的简短快速实施。

def sieve(limit=1000000):
    '''Returns a generator that yields the prime numbers up to limit.'''

    sieve_limit = gmpy2.isqrt(limit) + 1
    limit += 1
    # Mark bit positions 0 and 1 as not prime.
    bitmap = gmpy2.xmpz(3)
    # Process 2 separately. This allows us to use p+p for the step size
    # when sieving the remaining primes.
    bitmap[4 : limit : 2] = -1
    # Sieve the remaining primes.
    for p in bitmap.iter_clear(3, sieve_limit):
        bitmap[p*p : limit : p+p] = -1
    return bitmap.iter_clear(2, limit)

Answer 2

这很有效，但速度很慢。

import gmpy2
import numpy as np
x_str = '1'*8+'01'
print(x_str)
x_int = int(x_str,2)
x_mpz = gmpy2.mpz(x_int)
x_01 = bin(int(x_mpz))[2:] # get rid of '0b'
x_bin = x_01.replace('1','\x01').replace('0','\x00')
x_np_bool = np.fromstring(x_bin, dtype = bool)
x_1_index = np.where(x_np_bool)[0]
print(x_1_index)

有效地将gmpy2.mpz转换为numpy布尔数组

2 个答案: