Question

我正在处理大型张量，因此临时张量的numpy内存分配开始显着影响执行时间+代码有时会在这些中间步骤中引发内存分配错误。这里有两种方法可以将一个张量与另一个张量的int值（例如，result_ijk = a[i, b[i, j], k]）进行索引，而我想出的第二个似乎更符合记忆效果，我感觉像创建这个巨大的索引矩阵并迭代它的所有值（即使是并行）也是有线的（并经常达到内存限制）：

def test():
    i, j, k, l = 10, 20, 30, 40 # in reality, they're like 1e3..1e6
    a = np.random.rand(i, j, k)
    b = np.random.randint(0, j, size=i*l).reshape((i, l))
    # c_ilk = c[i, b[i, l], k]; shape(c) = (10, 40, 30)
    tmp = a[:, b, :] # <= i*ijk additional memory allocated (!) crazy
    c1 = np.diagonal(tmp, axis1=0, axis2=1).transpose([2, 0, 1])
    print(c1.shape)
    # another approach:
    ii, ll = np.indices((i, l)) # <= 2*i*l of temporary ints allocated
    tmp2 = b[ii, ll] # i*l of ints allocated, slow ops
    c2 = a[ii, tmp2] # slow ops over tensor
    print(c2.shape)
    print(np.allclose(c1, c2))

test()

- 关于如何优化这种类型的n-dim智能索引代码的任何建议？

如果我要在Theano中使用这段〜矢量化代码，它是否也将分配所有这些临时缓冲区，或者它可能以某种方式设法构建它们＆＃34; on-fly＆＃34;？是否有任何包以懒惰\更有效的方式执行此类索引而不分配这些ii - 如张量？

（注意：我最后需要对它进行渐变，所以我不能使用花哨的jit编译器，如numba :(）

Answer 1

一些方法：

i,j,k,l=[100]*4
a = np.random.randint(0,5,(i, j, k))
b = np.random.randint(0, j,(i, l))

def test1():
    # c_ilk = c[i, b[i, l], k]; shape(c) = (2,3,5)
    tmp = a[:, b, :] # <= i*ijk additional memory allocated (!) crazy
    c1 = np.diagonal(tmp, axis1=0, axis2=1).transpose([2, 0, 1])
    return c1

def test2():
    ii, ll = np.indices((i, l)) # <= 2*i*l of temporary ints allocated
    tmp2 = b[ii, ll] # i*l of ints allocated, slow ops
    c2 = a[ii, tmp2] # slow ops over tensor
    #print(c2.shape)
    return c2

def test3():
    c3=np.empty((i,l,k),dtype=a.dtype)   
    for ii in range(i):
        for ll in range(l):
                c3[ii,ll]=a[ii,b[ii,ll]]
    return c3        

from numba import jit
test4=jit(test3)

以及相应的基准：

In [54]: %timeit test1()
1 loop, best of 3: 720 ms per loop

In [55]: %timeit test2()
100 loops, best of 3: 7.79 ms per loop

In [56]: %timeit test3()
10 loops, best of 3: 43.7 ms per loop

In [57]: %timeit test4()
100 loop, best of 3: 4.99 ms per loop

这似乎表明（参见@Eelco Hoogendoorn评论）你的第二种方法对于大尺寸几乎是最佳的，而第一种方法是一个糟糕的选择。

对于numba，您可以使用此部分代码，并在非“jited”函数中应用渐变。

Answer 2

您只需要分配一个长度为User::with('userprofile')->where('is_active','=','1')->get();的整数数组，以获得所需的结果：

广播可以根据需要动态复制值，而无需为它们分配内存。

如果你为大型数组计时，你会发现它只比第二种方法快一点：正如其他人所说，中间索引数组将比你的整体计算小几个数量级，因此优化它对总运行时间或内存占用量的影响很小。

在大张量上的numpy n维智能索引 - 内存效率

2 个答案: