Question

我有一个预先计算好的numpy数组，占用不到9.5 GB。我已将其保存为npy文件，并使用h5py和hdf5文件。虽然我可以使用任何一种格式读取这个数组，但是当我在实际运行模块时读取它时，我得到一个“内存错误”：

File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/dataset.py", line 440, in __getitem__
    arr = numpy.ndarray(mshape, new_dtype, order='C')
MemoryError

无论是保存/读取npy文件还是hdf5文件，都会发生这种情况。

我尝试过使用numpy.memmap，因此我可以将磁盘内存替换为RAM，但似乎无法准确读取数组：

>>> import numpy as np
>>> zz=np.load('VGG16_l19_val.npy')
>>> zz.dtype
dtype('float64')
>>> zz.shape
(50000, 25088)
# So, I've read in the array using np.load and know its dtype and shape

>>> from numpy import unravel_index
>>> unravel_index(zz.argmax(), zz.shape)
(41232, 8208)
>>> zz[41232,8208]
937.5606689453125
# I now know the max value of zz and where it occurs

>>> zz2=np.memmap('VGG16_l19_val.npy', mode = 'r', dtype=np.float64, shape=   (50000,25088))
>>> zz2.dtype
dtype('float64')
>>> zz2.shape
(50000, 25088)
# I've read a memmap version of the array and have the correct dtype and shape, but ...

>>> zz2[41232,8208]
0.0
>>> zz2.max()
memmap(8.447400968892931e+252)
>>>
# It doesn't appear that zz2 == zz

我对np.memmap有什么不了解？我可以用它来读取这个numpy数组吗？

如果没有，我该怎么办，除了拆分数组并将其保存在几个文件中？

当我在解释器或pdb中时，为什么我可以毫无问题地读取数组，但是当我在模块中读取它时，如果没有MemoryError则无法读取它？

如何读取没有内存错误的9.4GB numpy数组

0 个答案: