我有两个矩阵:
import numpy as np
def create(n):
M = array([[ 0.33840224, 0.25420152, 0.40739624],
[ 0.35087337, 0.40939274, 0.23973389],
[ 0.40168642, 0.29848413, 0.29982946],
[ 0.17442095, 0.50982272, 0.31575633]])
return np.concatenate([M] * n)
A = create(1)
nof_type = A.shape[1]
I = np.eye(nof_type)
矩阵A
维度为4 x 3
,我是3 x 3
。
我想做的是
A
中的每一行计算I
中每一行的距离得分。A
中的每一行报告I
的行ID和最高得分
醇>
所以在一天结束时我们有4 x 2矩阵。 我是如何实现这一目标的?
这是计算两个numpy数组之间的距离得分的函数。
def jsd(x,y): #Jensen-shannon divergence
import warnings
warnings.filterwarnings("ignore", category = RuntimeWarning)
x = np.array(x)
y = np.array(y)
d1 = x*np.log2(2*x/(x+y))
d2 = y*np.log2(2*y/(x+y))
d1[np.isnan(d1)] = 0
d2[np.isnan(d2)] = 0
d = 0.5*np.sum(d1+d2)
return d
在实际情况中,A
的行数约为40K。所以我们真的希望它快速。
使用循环方式:
def scoreit (A, I):
aoa = []
for i, x in enumerate(A):
maxscore = -10000
id = -1
for j, y in enumerate(I):
distance = jsd(x, y)
#print "\t", i, j, distance
if dist > maxscore:
maxscore = distance
id = j
#print "MAX", maxscore, id
aoa.append([maxscore,id])
return aoa
打印此结果:
In [56]: scoreit(A,I)
Out[56]:
[[0.54393736529629078, 1],
[0.56083720679952753, 2],
[0.49502813447483673, 1],
[0.64408263453965031, 0]]
当前时间:
In [57]: %timeit scoreit(create(1000),I)
1 loops, best of 3: 3.31 s per loop
答案 0 :(得分:2)
您可以将I
的维度扩展到各个位置的3D
数组版本,以便将powerful broadcasting
置于其中。我们保持原样A
,因为它是一个庞大的阵列,我们不希望因为它们的元素而导致性能损失。此外,您可以避免检查NaNs
这一代价高昂的事情,并使用np.nansum
的单个操作进行求和,并对non-NaNs
进行求和。因此,矢量化解决方案看起来像这样 -
def jsd_vectorized(A,I):
# Perform "(x+y)" in a vectorized manner
AI = A+I[:,None]
# Calculate d1 and d2 using AI again in vectorized manner
d1 = A*np.log2(2*A/AI)
d2 = I[:,None,:]*np.log2((2*I[:,None,:])/AI)
# Use np.nansum to ignore NaNs & sum along rows to get all distances
dists = np.nansum(d1,2) + np.nansum(d2,2)
# Pack the argmax IDs and the corresponding scores as final output
ID = dists.argmax(0)
return np.vstack((0.5*dists[ID,np.arange(dists.shape[1])],ID)).T
示例运行
运行原始功能代码的Loopy函数 -
def jsd_loopy(A,I):
dists = np.empty((A.shape[0],I.shape[0]))
for i, x in enumerate(A):
for j, y in enumerate(I):
dists[i,j] = jsd(x, y)
ID = dists.argmax(1)
return np.vstack((dists[np.arange(dists.shape[0]),ID],ID)).T
运行并验证 -
In [511]: A = np.array([[ 0.33840224, 0.25420152, 0.40739624],
...: [ 0.35087337, 0.40939274, 0.23973389],
...: [ 0.40168642, 0.29848413, 0.29982946],
...: [ 0.17442095, 0.50982272, 0.31575633]])
...: nof_type = A.shape[1]
...: I = np.eye(nof_type)
...:
In [512]: jsd_loopy(A,I)
Out[512]:
array([[ 0.54393737, 1. ],
[ 0.56083721, 2. ],
[ 0.49502813, 1. ],
[ 0.64408263, 0. ]])
In [513]: jsd_vectorized(A,I)
Out[513]:
array([[ 0.54393737, 1. ],
[ 0.56083721, 2. ],
[ 0.49502813, 1. ],
[ 0.64408263, 0. ]])
运行时测试
In [514]: A = np.random.rand(1000,3)
In [515]: nof_type = A.shape[1]
...: I = np.eye(nof_type)
...:
In [516]: %timeit jsd_loopy(A,I)
1 loops, best of 3: 782 ms per loop
In [517]: %timeit jsd_vectorized(A,I)
1000 loops, best of 3: 1.17 ms per loop
In [518]: np.allclose(jsd_loopy(A,I),jsd_vectorized(A,I))
Out[518]: True