Numpy折叠列根据列表

时间:2016-09-11 06:15:29

标签: python numpy vectorization numpy-broadcasting

NumPy中,我有一个d x n数组A和一个长度为L的列表n,描述了我希望每列{{1}最终在矩阵A中。我们的想法是矩阵B的列iB所有列的sumA中的对应值为L

我可以使用i循环执行此操作:

for

我想知道是否有办法通过切片数组A = np.arange(15).reshape(3,5) L = [0,1,2,1,1] n_cols = 3 B = np.zeros((len(A), n_cols)) # assume I know the desired number of columns, # which is also one more than the maximum value of L for i, a in enumerate(A.T): B[:, L[i]] += a (或以某种其他方式的矢量化方式)来做到这一点?

2 个答案:

答案 0 :(得分:3)

您正在使用A关闭/折叠L列中的列以选择这些列。此外,您将根据L elems的唯一性更新输出数组的列。

因此,您可以使用np.add.reduceat作为矢量化解决方案,如此 -

sidx = L.argsort()
col_idx, grp_start_idx = np.unique(L[sidx],return_index=True)
B_out = np.zeros((len(A), n_cols))
B_out[:,col_idx] = np.add.reduceat(A[:,sidx],grp_start_idx,axis=1)

运行时测试 -

In [129]: def org_app(A,n_cols):
     ...:     B = np.zeros((len(A), n_cols)) 
     ...:     for i, a in enumerate(A.T):
     ...:         B[:, L[i]] += a
     ...:     return B
     ...: 
     ...: def vectorized_app(A,n_cols):
     ...:     sidx = L.argsort()
     ...:     col_idx, grp_start_idx = np.unique(L[sidx],return_index=True)
     ...:     B_out = np.zeros((len(A), n_cols))
     ...:     B_out[:,col_idx] = np.add.reduceat(A[:,sidx],grp_start_idx,axis=1)
     ...:     return B_out
     ...: 

In [130]: # Setup inputs with an appreciable no. of cols & lesser rows
     ...: # so as that memory bandwidth to work with huge number of 
     ...: # row elems doesn't become the bottleneck
     ...: d,n_cols = 10,5000
     ...: A = np.random.rand(d,n_cols)
     ...: L = np.random.randint(0,n_cols,(n_cols,))
     ...: 

In [131]: np.allclose(org_app(A,n_cols),vectorized_app(A,n_cols))
Out[131]: True

In [132]: %timeit org_app(A,n_cols)
10 loops, best of 3: 33.3 ms per loop

In [133]: %timeit vectorized_app(A,n_cols)
100 loops, best of 3: 1.87 ms per loop

当行数与A中的cols数相当时,向量化方法的高内存带宽要求将抵消任何明显的加速。

答案 1 :(得分:1)

这个“B”的迭代是否相同(未经测试)?

 for I in range(B.shape[1]):
       B[:, I] = A[:, L==I].sum(axis=1)

数字循环会更少。但更重要的是,它可能会提供其他矢量化见解。

(编辑)测试,这是有效的,但速度要慢得多。

+ ======

scipy.sparse与矩阵乘积的列和为1。我们可以在这做同样的事吗?在右列中使{1}成为C数组

def my2(A,L):
    n_cols = L.shape[0]
    C = np.zeros((n_cols,n_cols),int)
    C[np.arange(n_cols), L] = 1
    return A.dot(C)

这比你的循环快7倍,比@Divakars reduceat代码快一点。

==========

In [126]: timeit vectorized_app(A,L)
The slowest run took 8.16 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 99.7 µs per loop
In [127]: timeit val2 = my2(A,L)
The slowest run took 10.46 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 81.6 µs per loop
In [128]: timeit org1(A,n_cols)
1000 loops, best of 3: 623 µs per loop
In [129]: d,n_cols
Out[129]: (10, 100)