假设我有一个二维(或n维)布尔数组,如下所示:
import numpy as np
arr = np.array([[1, 0, 0, 0, 1, 0],
[0, 1, 1, 1, 0, 1],
[1, 1, 1, 1, 1, 0]], dtype='bool')
我想创建一个节点的节省空间的二进制表示,以存储在数据库中,然后检索它。我怎么能这样做?
答案 0 :(得分:0)
numpy.packbits
可用于生成布尔数组的每位一位二进制表示,而numpy.unpackbits
可用于几乎执行相反的操作。不幸的是,这是一个不完美的逆,因为packbits
会丢弃有关阵列形状(甚至维度)的所有信息,并用0值位填充它以达到完整的字节数。因此,我们还需要显式存储数组的形状,以便能够解析二进制表示:
import numpy as np
def serialize_boolean_array(array: np.array) -> bytes:
"""
Takes a numpy.array with boolean values and converts it to a space-efficient
binary representation.
"""
return np.packbits(array).tobytes()
def deserialize_boolean_array(serialized_array: bytes, shape: tuple) -> np.array:
"""
Inverse of serialize_boolean_array.
"""
num_elements = np.prod(shape)
packed_bits = np.frombuffer(serialized_array, dtype='uint8')
result = np.unpackbits(packed_bits)[:num_elements]
result.shape = shape
return result
# Array to serialize
arr = np.array([[1, 0, 0, 0, 1, 0],
[0, 1, 1, 1, 0, 1],
[1, 1, 1, 1, 1, 0]], dtype='bool')
# Serialized representation. You need to store the shape as well as arr_bytes
# in order to be able to deserialize this later. If you know the
# dimensionality of your array in advance (e.g. you know it's 2D), you might
# want to store the shape in several columns in your database - e.g. width
# and height. Otherwise, you could serialize it with
# shape_bytes = np.array(shape).tobytes()
# and deserialize it later with
# shape = tuple(np.frombuffer(shape_bytes, dtype='int64'))
arr_bytes, shape = serialize_boolean_array(arr), arr.shape
# Deserialize:
arr2 = deserialize_boolean_array(arr_bytes, shape)
np.array_equal(arr, arr2) # True