如何有效地对numpy中的每个块矩阵块应用相同的操作?

时间:2017-09-14 01:45:55

标签: python arrays numpy

我有一个大的2d数组如下:

B = [B_0, B_1, B_2, B_n]

其中B_0, B_1, ..., B_n具有相同的行数,但不同的列数和n可能非常大。我还有另一个1d数组idx,其形状为(n+1,)

B_i = B[:, idx[i]:idx[i+1]]

idx[-1]idx的最后一个元素)是B的总列数。

我想为每个B_i执行相同的矩阵运算,例如:

B_i.T()@B_i

或者使用另一个2d数组:

D = [[D_0], [D_1], ..., [D_n]]
带有D_0, D_1, ..., D_n

具有相同的列数,等于B的行数,但行数不同,

D_i = D[idx[i]:idx[i+1], :]

我想计算D_i@B_i

所以我的问题是如何在python中实现它并避免使用for循环?

以下是一个例子:

import numpy as np
from timeit import default_timer as timer
# Prepare the test data
n = 1000000 # the number of small matrix 

idx = np.zeros(n+1, dtype=np.int)
idx[1:] = np.random.randint(1, 10, size=n)
idx = np.cumsum(idx)

B = np.random.rand(3, idx[-1])

# Computation
start = timer()
C = []
for i in range(n):
    B_i = B[:, idx[i]:idx[i+1]]
    C_i = B_i.T@B_i
    C.append(C_i)
end = timer()
print('Total time:', end - start)

2 个答案:

答案 0 :(得分:1)

如果我添加到您的代码中:

print(B.shape)
print(idx)
print([x.shape for x in C])

Bnn = np.zeros((n, 3, idx[-1]))
for i in range(n):
    s = np.s_[idx[i]:idx[i+1]]
    Bnn[i,:,s] = B[:, s]
Bnn = Bnn.reshape(3*n,-1)
Cnn = Bnn.T @ Bnn
print(Bnn.shape, Cnn.shape)
print(Cnn.sum(), sum([x.sum() for x in C]))

并更改n=5,我得

2115:~/mypy$ python3 stack46209231.py 
(3, 31)    # B shape
[ 0  9 17 18 25 31]
[(9, 9), (8, 8), (1, 1), (7, 7), (6, 6)]  # shapes of C elements
(15, 31) (31, 31)     # shapes of diagonalized B and C
197.407879357 197.407879357   # C sums from the 2 routes

所以我的想法是制作B的对角化版本,并使用它来执行点积。对于应该更快的适度大小的数组,尽管创建Bnn的迭代需要时间,从Cnn中提取块也是如此。

但是BnnCnn会变得非常大,并且会因内存交换而陷入困境,并最终导致内存错误。

使用block_diag功能,将B转换为稀疏矩阵非常简单:

from scipy import sparse

Blist = [B[:, idx[i]:idx[i+1]] for i in range(n)]
Bs = sparse.block_diag(Blist, format='bsr')
print(repr(Bs))
Cs = Bs.T@Bs
print(repr(Cs))
print(Cs.sum())

和样本运行

2158:~/mypy$ python3 stack46209231.py 
(3, 20)
[ 0  1  5  9 17 20]
[(1, 1), (4, 4), (4, 4), (8, 8), (3, 3)]
(15, 20) (20, 20)
94.4190125992 94.4190125992
<15x20 sparse matrix of type '<class 'numpy.float64'>'
    with 60 stored elements (blocksize = 1x1) in Block Sparse Row format>
<20x20 sparse matrix of type '<class 'numpy.float64'>'
    with 106 stored elements (blocksize = 1x1) in Block Sparse Row format>

和形状和校验和匹配。

对于n = 10000Bnn对我的记忆来说太大了。稀疏Bs创建很慢,但矩阵乘法很快。

答案 1 :(得分:0)

可以使用maplambda功能完成此项工作,请参阅以下代码:

import numpy as np
from timeit import default_timer as timer
# Prepare the test data
n = 1000000 # the number of small matrix 

idx = np.zeros(n+1, dtype=np.int)
idx[1:] = np.random.randint(1, 10, size=n)
idx = np.cumsum(idx)

B = np.random.rand(3, idx[-1])
D = np.random.rand(idx[-1], 3)

BB = np.hsplit(B, idx[1:-1])
DD = np.vsplit(D, idx[1:-1])

CC = list(map(lambda x: x[0]@x[1], zip(DD, BB)))