Question

如果我尝试执行：

np.empty(shape= (108698,200,1000))

在我的jupyter笔记本中，它会抛出错误

MemoryError                               Traceback (most recent call last)
<ipython-input-35-0aedb09803e9> in <module>()
      1 import numpy as np
      2 #np.empty(shape=(108698-0,200,1000))
----> 3 np.empty(shape= (108698,200,1000))
      4 #np.empty(shape=(end-start,n_words,embedding_size))

但是当我尝试执行

时

np.empty(shape= (84323,200,1000)),

执行时没有任何错误。

有没有办法可以运行

np.empty(shape= (108698,200,1000))

不增加机器的RAM？

Answer 1

没有。虽然这取决于你已经运行的内容，但是如果你达到了最大分配的内存，你就不能创造更多。例如，如果您正在运行64位numpy，每个条目8个字节，那么将占用174 GB，这将占用太多空间。如果您知道数据条目并且愿意使用除numpy之外的东西，您可以查看稀疏数组。稀疏数组只存储非零元素及其位置索引，这可能会节省空间。

Answer 2

您可以使用内存映射文件处理不适合内存的数组。 Numpy有这样的设施：numpy.memmap。

E.g：

x = np.memmap('test.bin', mode='w+', shape=(108698,200,1000))

但是，在32位Python上，文件仍然限制在2GB。

Answer 3

没有为shape定义上限，但数组的整体大小仅限于numpy.intp，通常为int32或int64。

您可以使用SciPi中的稀疏矩阵或将大型dtype数组的(108698,200,1000)限制为int8，这应该可以正常工作。

Answer 4

没有上限。我们可以（粗略地）估计ndarray的内存量：

>>> arr = np.empty(shape= (100,10,1000),dtype='unit8')
>>> hr_size(arr.nbytes)
'1M'

对于具有100万个元素的ndarray（'uint8'的每个元素需要一个字节），我们需要'976.6K'的内存。

for ndarray with shape =（84323,200,1000）and dtype ='uint8'

>>> hr_size(84323*200*1000)
'15.7G'

我们需要超过15G

最后是ndarray，形状为=（108698,200,1000），dtype ='uint8'

>>> hr_size(108698*200*1000)
'20.2G'

我们需要超过20G。

如果dtype为'int64'，则估计的内存量应增加8倍。

是否为numpy数组定义了最大大小？

4 个答案: