我试图在numpy中创建高效的广播阵列,例如一组shape=[1000,1000,1000]
个数组,只有1000个元素,但重复1e6次。这可以通过np.lib.stride_tricks.as_strided
和np.broadcast_arrays
实现。
但是,我无法验证内存中是否存在重复,这是至关重要的,因为实际复制内存中阵列的测试会导致我的机器崩溃而不会留下追溯。
我已尝试使用.nbytes
检查数组的大小,但这似乎与实际的内存使用情况相对应:
>>> import numpy as np
>>> import resource
>>> initial_memuse = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> pagesize = resource.getpagesize()
>>>
>>> x = np.arange(1000)
>>> memuse_x = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of x = {0} MB".format(x.nbytes/1e6))
Size of x = 0.008 MB
>>> print("Memory used = {0} MB".format((memuse_x-initial_memuse)*resource.getpagesize()/1e6))
Memory used = 150.994944 MB
>>>
>>> y = np.lib.stride_tricks.as_strided(x, [1000,10,10], strides=x.strides + (0, 0))
>>> memuse_y = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of y = {0} MB".format(y.nbytes/1e6))
Size of y = 0.8 MB
>>> print("Memory used = {0} MB".format((memuse_y-memuse_x)*resource.getpagesize()/1e6))
Memory used = 201.326592 MB
>>>
>>> z = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))
>>> memuse_z = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of z = {0} MB".format(z.nbytes/1e6))
Size of z = 80.0 MB
>>> print("Memory used = {0} MB".format((memuse_z-memuse_y)*resource.getpagesize()/1e6))
Memory used = 0.0 MB
所以.nbytes
报告"理论"数组的大小,但显然不是实际大小。 resource
检查有点尴尬,因为看起来有些东西正在加载&缓存(也许是?)导致第一次跨越占用一定量的内存,但未来的进步没有。
tl; dr:你如何确定内存中numpy数组或数组视图的实际大小?
答案 0 :(得分:4)
一种方法是检查数组的.base
attribute,该数组引用数组“借用”其内存的对象。例如:
x = np.arange(1000)
print(x.flags.owndata) # x "owns" its data
# True
print(x.base is None) # its base is therefore 'None'
# True
a = x.reshape(100, 10) # a is a reshaped view onto x
print(a.flags.owndata) # it therefore "borrows" its data
# False
print(a.base is x) # its .base is x
# True
使用np.lib.stride_tricks
:
b = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))
print(b.flags.owndata)
# False
print(b.base)
# <numpy.lib.stride_tricks.DummyArray object at 0x7fb40c02b0f0>
此处,b.base
是numpy.lib.stride_tricks.DummyArray
个实例,如下所示:
class DummyArray(object):
"""Dummy object that just exists to hang __array_interface__ dictionaries
and possibly keep alive a reference to a base array.
"""
def __init__(self, interface, base=None):
self.__array_interface__ = interface
self.base = base
因此,我们可以检查b.base.base
:
print(b.base.base is x)
# True
获得基础数组后,其.nbytes
属性应准确反映其占用的内存量。
原则上,可以查看数组视图,或者从另一个跨步数组创建跨步数组。假设您的视图或跨步数组最终由另一个numpy数组支持,您可以递归引用其.base
属性。找到.base
为None
的对象后,您已找到阵列借用其内存的基础对象:
def find_base_nbytes(obj):
if obj.base is not None:
return find_base_nbytes(obj.base)
return obj.nbytes
正如所料,
print(find_base_nbytes(x))
# 8000
print(find_base_nbytes(y))
# 8000
print(find_base_nbytes(z))
# 8000