序列化多维布尔NumPy数组到二进制

时间:2018-01-28 13:10:53

标签: python numpy

假设我有一个二维(或n维)布尔数组,如下所示:

import numpy as np
arr = np.array([[1, 0, 0, 0, 1, 0],
                [0, 1, 1, 1, 0, 1],
                [1, 1, 1, 1, 1, 0]], dtype='bool')

我想创建一个节点的节省空间的二进制表示,以存储在数据库中,然后检索它。我怎么能这样做?

1 个答案:

答案 0 :(得分:0)

numpy.packbits可用于生成布尔数组的每位一位二进制表示,而numpy.unpackbits可用于几乎执行相反的操作。不幸的是,这是一个不完美的逆,因为packbits会丢弃有关阵列形状(甚至维度)的所有信息,并用0值位填充它以达到完整的字节数。因此,我们还需要显式存储数组的形状,以便能够解析二进制表示:

import numpy as np

def serialize_boolean_array(array: np.array) -> bytes:
    """
    Takes a numpy.array with boolean values and converts it to a space-efficient
    binary representation.
    """
    return np.packbits(array).tobytes()

def deserialize_boolean_array(serialized_array: bytes, shape: tuple) -> np.array:
    """
    Inverse of serialize_boolean_array.
    """
    num_elements = np.prod(shape)
    packed_bits = np.frombuffer(serialized_array, dtype='uint8')
    result = np.unpackbits(packed_bits)[:num_elements]
    result.shape = shape
    return result

# Array to serialize
arr = np.array([[1, 0, 0, 0, 1, 0],
                [0, 1, 1, 1, 0, 1],
                [1, 1, 1, 1, 1, 0]], dtype='bool')

# Serialized representation. You need to store the shape as well as arr_bytes
# in order to be able to deserialize this later. If you know the
# dimensionality of your array in advance (e.g. you know it's 2D), you might
# want to store the shape in several columns in your database - e.g. width
# and height. Otherwise, you could serialize it with
#     shape_bytes = np.array(shape).tobytes()
# and deserialize it later with
#     shape = tuple(np.frombuffer(shape_bytes, dtype='int64'))
arr_bytes, shape = serialize_boolean_array(arr), arr.shape

# Deserialize:
arr2 = deserialize_boolean_array(arr_bytes, shape)

np.array_equal(arr, arr2)  # True