Question

我最初发布了一个有关有效计算logumexp的问题，在这里找到

How to efficiently compute logsumexp of upper triangle in a nested loop?

我接受的答案是

import Numpy as np

Wm  = np.array([[1,   2,   3],
                [4,   5,   6],
                [7,   8,   9],
                [10, 11,  12]])

wx  = np.array([1,   2,   3])
wy  = np.array([4,   5,   6])

Wxy = np.array([[5,   6,   7],
                [6,   7,   8],
                [7,   8,   9]])

'''
np.triu_indices = ([0, 0, 1], [1, 2, 2])
Wxy[triu_inds] = [6, 7, 8]
np.logsumexp(Wxy[triu_inds]) = log(exp(6) + exp(7) + exp(8))
'''

for x in range(n-1):
    wx = Wm[x, :]
    for y in range(x+1, n):
        wy = Wm[y, :]
        Wxy = np.add.outer(wx, wy)
        Wxy = Wxy[triu_inds]
        W[x, y] = np.logsumexp(Wxy)

# solution here
W = np.logsumexp(
    np.add.outer(Wm, Wm).swapaxes(1, 2)[(slice(None),)*2 + triu_inds],
    axis=-1  # Perform summation over last axis.
)
W = np.triu(W, k=1)

问题在于，对于大型矩阵来说，这确实很慢，因为问题迅速爆发。如果Wm的尺寸为m,n，则所需的内存量将随着(m*n)**2 * 8个字节而增长。我需要在大于1000x200的矩阵上运行此命令，但出现内存错误，而且速度很慢。

如何在大型矩阵上加速大型np.add.outer？

0 个答案: