在Numpy下的两个矩阵中为所有成对行应用函数

时间:2015-11-26 08:58:55

标签: python arrays numpy matrix

我有两个矩阵:

import numpy as np

def create(n):
    M = array([[ 0.33840224,  0.25420152,  0.40739624],
               [ 0.35087337,  0.40939274,  0.23973389],
               [ 0.40168642,  0.29848413,  0.29982946],
               [ 0.17442095,  0.50982272,  0.31575633]])
    return np.concatenate([M] * n)

A = create(1)
nof_type = A.shape[1]       
I = np.eye(nof_type)

矩阵A维度为4 x 3,我是3 x 3。 我想做的是

  1. 针对A中的每一行计算I中每一行的距离得分。
  2. {li>为A中的每一行报告I的行ID和最高得分

    所以在一天结束时我们有4 x 2矩阵。 我是如何实现这一目标的?

    这是计算两个numpy数组之间的距离得分的函数。

    def jsd(x,y): #Jensen-shannon divergence
        import warnings
        warnings.filterwarnings("ignore", category = RuntimeWarning)
        x = np.array(x)
        y = np.array(y)
        d1 = x*np.log2(2*x/(x+y))
        d2 = y*np.log2(2*y/(x+y))
        d1[np.isnan(d1)] = 0
        d2[np.isnan(d2)] = 0
        d = 0.5*np.sum(d1+d2)    
        return d
    

    在实际情况中,A的行数约为40K。所以我们真的希望它快速。

    使用循环方式:

    def scoreit (A, I):
        aoa = []
        for i, x in enumerate(A):
            maxscore = -10000
            id = -1
    
            for j, y in enumerate(I):
                distance = jsd(x, y) 
                #print "\t", i, j, distance
                if dist > maxscore:
                    maxscore = distance
                    id = j
            #print "MAX", maxscore, id
            aoa.append([maxscore,id])
        return aoa
    

    打印此结果:

    In [56]: scoreit(A,I)
    Out[56]:
    [[0.54393736529629078, 1],
     [0.56083720679952753, 2],
     [0.49502813447483673, 1],
     [0.64408263453965031, 0]]
    

    当前时间:

    In [57]: %timeit scoreit(create(1000),I)
    1 loops, best of 3: 3.31 s per loop
    

1 个答案:

答案 0 :(得分:2)

您可以将I的维度扩展到各个位置的3D数组版本,以便将powerful broadcasting置于其中。我们保持原样A,因为它是一个庞大的阵列,我们不希望因为它们的元素而导致性能损失。此外,您可以避免检查NaNs这一代价高昂的事情,并使用np.nansum的单个操作进行求和,并对non-NaNs进行求和。因此,矢量化解决方案看起来像这样 -

def jsd_vectorized(A,I):

    # Perform "(x+y)" in a vectorized manner
    AI = A+I[:,None]

    # Calculate d1 and d2 using AI again in vectorized manner
    d1 = A*np.log2(2*A/AI)
    d2 = I[:,None,:]*np.log2((2*I[:,None,:])/AI)

    # Use np.nansum to ignore NaNs & sum along rows to get all distances
    dists = np.nansum(d1,2) + np.nansum(d2,2)

    # Pack the argmax IDs and the corresponding scores as final output   
    ID = dists.argmax(0)
    return np.vstack((0.5*dists[ID,np.arange(dists.shape[1])],ID)).T

示例运行

运行原始功能代码的Loopy函数 -

def jsd_loopy(A,I):
    dists = np.empty((A.shape[0],I.shape[0]))
    for i, x in enumerate(A):   
        for j, y in enumerate(I):
            dists[i,j] = jsd(x, y)
    ID = dists.argmax(1)
    return np.vstack((dists[np.arange(dists.shape[0]),ID],ID)).T

运行并验证 -

In [511]: A = np.array([[ 0.33840224,  0.25420152,  0.40739624],
     ...:        [ 0.35087337,  0.40939274,  0.23973389],
     ...:        [ 0.40168642,  0.29848413,  0.29982946],
     ...:        [ 0.17442095,  0.50982272,  0.31575633]])
     ...: nof_type = A.shape[1]       
     ...: I = np.eye(nof_type)
     ...: 

In [512]: jsd_loopy(A,I)
Out[512]: 
array([[ 0.54393737,  1.        ],
       [ 0.56083721,  2.        ],
       [ 0.49502813,  1.        ],
       [ 0.64408263,  0.        ]])

In [513]: jsd_vectorized(A,I)
Out[513]: 
array([[ 0.54393737,  1.        ],
       [ 0.56083721,  2.        ],
       [ 0.49502813,  1.        ],
       [ 0.64408263,  0.        ]])

运行时测试

In [514]: A = np.random.rand(1000,3)

In [515]: nof_type = A.shape[1]       
     ...: I = np.eye(nof_type)
     ...: 

In [516]: %timeit jsd_loopy(A,I)
1 loops, best of 3: 782 ms per loop

In [517]: %timeit jsd_vectorized(A,I)
1000 loops, best of 3: 1.17 ms per loop

In [518]: np.allclose(jsd_loopy(A,I),jsd_vectorized(A,I))
Out[518]: True