Question

我有这段代码：

n = np.load(matrix)["arr_0"]
shape = n.shape
##colsums and rowsums
rows = {}
cols = {}
for i in xrange(shape[0]): #row
    rows[i] = np.sum(n[i,:])
for j in xrange(shape[1]): #cols
    cols[j] = np.sum(n[:,j])
##looping over bins
for i in xrange(shape[0]): #row
    print i
    for j in xrange(shape[1]): #column
        if rows[i] == 0 or cols[j] == 0:
            continue
        n[i,j] = n[i,j]/math.sqrt(rows[i]*cols[j])

它基本上循环一个形状为(50000,50000)的numpy矩阵，我需要将相应列之和的乘积的每个值除以相应行的总和。我的实施需要很长时间您有什么建议可以改善其表现吗？

Answer 1

您可以在每个轴上单独获取总和，然后取外部产品，然后取平方根。这可以缩小一点，但它可以让你知道如何对其进行矢量化。

# Sum of rows and columns
a = numpy.sum(data, axis=1)
b = numpy.sum(data, axis=0)

# Product of sum and columns
c = numpy.outer(a,b)

# The square root...
d = numpy.sqrt(c)

# ...a nd the division
data /= d

Answer 2

以下是使用np.where和NumPy broadcasting -

的单线解决方案

np.where((rows[:,None]==0) | (cols==0),n,n/np.sqrt((rows[:,None]*cols)))

划分numpy数组Python

2 个答案: