Question

numpy.array.tostring似乎无法保留有关矩阵维度的信息（请参阅this question），要求用户发出对numpy.array.reshape的调用。

有没有办法将numpy数组序列化为JSON格式，同时保留这些信息？

注意：数组可能包含整数，浮点数或布尔值。期待转置数组是合理的。

注2：这样做的目的是使用streamparse将numpy数组传递给Storm拓扑，以防这些信息最终相关。

Answer 1

pickle.dumps或numpy.save编码重建任意NumPy数组所需的所有信息，即使存在字节序问题，非连续数组或奇怪的元组dtypes。字节序问题可能是最重要的;你不希望array([1])突然变成array([16777216])，因为你在大端机器上加载了你的阵列。 pickle可能是更方便的选项，但save中有pickle有自己的好处。

import pickle a = # some NumPy array serialized = pickle.dumps(a, protocol=0) # protocol 0 is printable ASCII deserialized_a = pickle.loads(serialized)选项：

numpy.save

StringIO使用二进制格式，它需要写入文件，但您可以使用a = # any NumPy array memfile = StringIO.StringIO() numpy.save(memfile, a) memfile.seek(0) serialized = json.dumps(memfile.read().decode('latin-1')) # latin-1 maps byte n to unicode code point n来解决这个问题：

memfile = StringIO.StringIO()
memfile.write(json.loads(serialized).encode('latin-1'))
memfile.seek(0)
a = numpy.load(memfile)

反序列化：

deviceWhiteBalanceGainsForTemperatureAndTintValues

Answer 2

编辑：正如人们可以在问题的评论中看到这个解决方案处理＆＃34;正常＆＃34; numpy数组（浮点数，整数，bools ...）而不是多类型结构化数组。

序列化任何维度和数据类型的numpy数组的解决方案

据我所知，你不能简单地序列化任何数据类型和任何维度的numpy数组......但你可以将它的数据类型，维度和信息存储在列表表示中，然后使用JSON序列化它。

需要进口：

import json
import base64

对于编码，您可以使用（nparray是任何数据类型和任何维度的一些numpy数组）：

json.dumps([str(nparray.dtype), base64.b64encode(nparray), nparray.shape])

在此之后，您将获得数据的JSON转储（字符串），其中包含其数据类型和形状的列表表示以及base64编码的数组数据/内容。

用于解码这可以完成工作（encStr是从某处加载的编码JSON字符串）：

# get the encoded json dump
enc = json.loads(encStr)

# build the numpy data type
dataType = numpy.dtype(enc[0])

# decode the base64 encoded numpy array data and create a new numpy array with this data & type
dataArray = numpy.frombuffer(base64.decodestring(enc[1]), dataType)

# if the array had more than one data set it has to be reshaped
if len(enc) > 2:
     dataArray.reshape(enc[2])   # return the reshaped numpy array containing several data sets

由于许多原因，JSON转储是高效且交叉兼容的，但如果要存储和加载任何类型和任何维度的的numpy数组，只是采用JSON会导致意外结果>

此解决方案存储和加载numpy数组，无论其类型或维度如何，并且还能正确恢复（数据类型，维度......）

我几个月前尝试了几种解决方案，这是我遇到的唯一有效，多功能的解决方案。

Answer 3

我发现Msgpack-numpy中的代码很有帮助。 https://github.com/lebedov/msgpack-numpy/blob/master/msgpack_numpy.py

我稍微修改了序列化的dict并添加了base64编码以减少序列化的大小。

通过使用与json（提供load（s），dump（s））相同的接口，您可以为json序列化提供替代。

这个逻辑可以扩展为添加任何自动的非平凡序列化，例如datetime对象。

修改我编写了一个通用的，模块化的解析器来完成这个以及更多。 https://github.com/someones/jaweson

我的代码如下：

np_json.py

from json import * import json import numpy as np import base64 def to_json(obj): if isinstance(obj, (np.ndarray, np.generic)): if isinstance(obj, np.ndarray): return { '__ndarray__': base64.b64encode(obj.tostring()), 'dtype': obj.dtype.str, 'shape': obj.shape, } elif isinstance(obj, (np.bool_, np.number)): return { '__npgeneric__': base64.b64encode(obj.tostring()), 'dtype': obj.dtype.str, } if isinstance(obj, set): return {'__set__': list(obj)} if isinstance(obj, tuple): return {'__tuple__': list(obj)} if isinstance(obj, complex): return {'__complex__': obj.__repr__()} # Let the base class default method raise the TypeError raise TypeError('Unable to serialise object of type {}'.format(type(obj))) def from_json(obj): # check for numpy if isinstance(obj, dict): if '__ndarray__' in obj: return np.fromstring( base64.b64decode(obj['__ndarray__']), dtype=np.dtype(obj['dtype']) ).reshape(obj['shape']) if '__npgeneric__' in obj: return np.fromstring( base64.b64decode(obj['__npgeneric__']), dtype=np.dtype(obj['dtype']) )[0] if '__set__' in obj: return set(obj['__set__']) if '__tuple__' in obj: return tuple(obj['__tuple__']) if '__complex__' in obj: return complex(obj['__complex__']) return obj # over-write the load(s)/dump(s) functions def load(*args, **kwargs): kwargs['object_hook'] = from_json return json.load(*args, **kwargs) def loads(*args, **kwargs): kwargs['object_hook'] = from_json return json.loads(*args, **kwargs) def dump(*args, **kwargs): kwargs['default'] = to_json return json.dump(*args, **kwargs) def dumps(*args, **kwargs): kwargs['default'] = to_json return json.dumps(*args, **kwargs)

您应该可以执行以下操作：

import numpy as np import np_json as json np_data = np.zeros((10,10), dtype=np.float32) new_data = json.loads(json.dumps(np_data)) assert (np_data == new_data).all()

Answer 4

Msgpack具有最佳的序列化性能：http://www.benfrederickson.com/dont-pickle-your-data/

使用msgpack-numpy。见https://github.com/lebedov/msgpack-numpy

安装它：

pip install msgpack-numpy

然后：

import msgpack
import msgpack_numpy as m
import numpy as np

x = np.random.rand(5)
x_enc = msgpack.packb(x, default=m.encode)
x_rec = msgpack.unpackb(x_enc, object_hook=m.decode)

Answer 5

如果它需要人类可读并且您知道这是一个numpy数组：

import numpy as np; 
import json;

a = np.random.normal(size=(50,120,150))
a_reconstructed = np.asarray(json.loads(json.dumps(a.tolist())))
print np.allclose(a,a_reconstructed)
print (a==a_reconstructed).all()

当数组大小变大时，可能不是最有效的，但适用于较小的数组。

Answer 6

尝试traitschema https://traitschema.readthedocs.io/en/latest/

“使用traits和Numpy创建可序列化，类型检查的模式。典型的用例包括保存几个不同形状和类型的Numpy数组。”

Answer 7

这包装了@user2357112 的基于泡菜的答案，以便于 JSON 集成

下面的代码将其编码为 base64。它将处理任何类型/大小的 numpy 数组，而无需记住它是什么。它还可以处理其他可以被酸洗的任意对象。

import numpy as np
import json
import pickle
import codecs

class PythonObjectEncoder(json.JSONEncoder):
    def default(self, obj):
        return {
            '_type': str(type(obj)),
            'value': codecs.encode(pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL), "base64").decode('latin1')
            }

class PythonObjectDecoder(json.JSONDecoder):
    def __init__(self, *args, **kwargs):
        json.JSONDecoder.__init__(self, object_hook=self.object_hook, *args, **kwargs)

    def object_hook(self, obj):
        if '_type' in obj:
            try:
                return pickle.loads(codecs.decode(obj['value'].encode('latin1'), "base64"))
            except KeyError:
                return obj
        return obj


# Create arbitrary array
originalNumpyArray = np.random.normal(size=(3, 3))
print(originalNumpyArray)

# Serialization
numpyData = {
   "array": originalNumpyArray
   }
encodedNumpyData = json.dumps(numpyData, cls=PythonObjectEncoder)
print(encodedNumpyData)

# Deserialization
decodedArrays = json.loads(encodedNumpyData, cls=PythonObjectDecoder)
finalNumpyArray = decodedArrays["array"]

# Verify
print(finalNumpyArray)
print(np.allclose(originalNumpyArray, finalNumpyArray))
print((originalNumpyArray==finalNumpyArray).all())

Answer 8

尝试 numpy-serializer：

下载

pip install numpy-serializer

用法

import numpy_serializer as ns
import numpy as np

a = np.random.normal(size=(50,120,150))
b = ns.to_bytes(a)
c = ns.from_bytes(b)
assert np.array_equal(a,c)

Answer 9

尝试使用numpy.array_repr或numpy.array_str。

如何在保留矩阵尺寸的同时序列化numpy数组？

9 个答案:

尝试 numpy-serializer：

下载

用法