Question

我有一个大的2d数组如下：

B = [B_0, B_1, B_2, B_n]

其中B_0, B_1, ..., B_n具有相同的行数，但不同的列数和n可能非常大。我还有另一个1d数组idx，其形状为(n+1,)，

B_i = B[:, idx[i]:idx[i+1]]

和idx[-1]（idx的最后一个元素）是B的总列数。

我想为每个B_i执行相同的矩阵运算，例如：

B_i.T()@B_i

或者使用另一个2d数组：

D = [[D_0], [D_1], ..., [D_n]]

带有D_0, D_1, ..., D_n的

具有相同的列数，等于B的行数，但行数不同，

D_i = D[idx[i]:idx[i+1], :]

我想计算D_i@B_i。

所以我的问题是如何在python中实现它并避免使用for循环？

以下是一个例子：

import numpy as np
from timeit import default_timer as timer
# Prepare the test data
n = 1000000 # the number of small matrix 

idx = np.zeros(n+1, dtype=np.int)
idx[1:] = np.random.randint(1, 10, size=n)
idx = np.cumsum(idx)

B = np.random.rand(3, idx[-1])

# Computation
start = timer()
C = []
for i in range(n):
    B_i = B[:, idx[i]:idx[i+1]]
    C_i = B_i.T@B_i
    C.append(C_i)
end = timer()
print('Total time:', end - start)

Answer 1

如果我添加到您的代码中：

print(B.shape)
print(idx)
print([x.shape for x in C])

Bnn = np.zeros((n, 3, idx[-1]))
for i in range(n):
    s = np.s_[idx[i]:idx[i+1]]
    Bnn[i,:,s] = B[:, s]
Bnn = Bnn.reshape(3*n,-1)
Cnn = Bnn.T @ Bnn
print(Bnn.shape, Cnn.shape)
print(Cnn.sum(), sum([x.sum() for x in C]))

并更改n=5，我得

2115:~/mypy$ python3 stack46209231.py 
(3, 31)    # B shape
[ 0  9 17 18 25 31]
[(9, 9), (8, 8), (1, 1), (7, 7), (6, 6)]  # shapes of C elements
(15, 31) (31, 31)     # shapes of diagonalized B and C
197.407879357 197.407879357   # C sums from the 2 routes

所以我的想法是制作B的对角化版本，并使用它来执行点积。对于应该更快的适度大小的数组，尽管创建Bnn的迭代需要时间，从Cnn中提取块也是如此。

但是Bnn和Cnn会变得非常大，并且会因内存交换而陷入困境，并最终导致内存错误。

使用block_diag功能，将B转换为稀疏矩阵非常简单：

from scipy import sparse

Blist = [B[:, idx[i]:idx[i+1]] for i in range(n)]
Bs = sparse.block_diag(Blist, format='bsr')
print(repr(Bs))
Cs = Bs.T@Bs
print(repr(Cs))
print(Cs.sum())

和样本运行

2158:~/mypy$ python3 stack46209231.py 
(3, 20)
[ 0  1  5  9 17 20]
[(1, 1), (4, 4), (4, 4), (8, 8), (3, 3)]
(15, 20) (20, 20)
94.4190125992 94.4190125992
<15x20 sparse matrix of type '<class 'numpy.float64'>'
    with 60 stored elements (blocksize = 1x1) in Block Sparse Row format>
<20x20 sparse matrix of type '<class 'numpy.float64'>'
    with 106 stored elements (blocksize = 1x1) in Block Sparse Row format>

和形状和校验和匹配。

对于n = 10000，Bnn对我的记忆来说太大了。稀疏Bs创建很慢，但矩阵乘法很快。

Answer 2

可以使用map和lambda功能完成此项工作，请参阅以下代码：

import numpy as np
from timeit import default_timer as timer
# Prepare the test data
n = 1000000 # the number of small matrix 

idx = np.zeros(n+1, dtype=np.int)
idx[1:] = np.random.randint(1, 10, size=n)
idx = np.cumsum(idx)

B = np.random.rand(3, idx[-1])
D = np.random.rand(idx[-1], 3)

BB = np.hsplit(B, idx[1:-1])
DD = np.vsplit(D, idx[1:-1])

CC = list(map(lambda x: x[0]@x[1], zip(DD, BB)))

如何有效地对numpy中的每个块矩阵块应用相同的操作？

2 个答案: