我有一个关于如何尽可能快地计算numpy距离的问题,
def getR1(VVm,VVs,HHm,HHs):
t0=time.time()
R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis]
R*=R
R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis]
R1*=R1
R+=R1
del R1
print "R1\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500)
print numpy.max(R) #4176.26290975
# uses 17.5Gb ram
return R
def getR2(VVm,VVs,HHm,HHs):
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :]
#print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2)
R = numpy.einsum('ijk,ijk->ij', deltas, deltas)
print "R2\t",time.time()-t0,R.shape, #14.5291359425 (108225, 10500)
print numpy.max(R) #4176.26290975
# uses 26Gb ram
return R
def getR3(VVm,VVs,HHm,HHs):
from numpy.core.umath_tests import inner1d
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :]
#print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2)
R = inner1d(deltas, deltas)
print "R3\t",time.time()-t0, R.shape, #12.6972110271 (108225, 10500)
print numpy.max(R) #4176.26290975
#Uses 26Gb
return R
def getR4(VVm,VVs,HHm,HHs):
from scipy.spatial.distance import cdist
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
R=spdist.cdist(precomputed_flat,measured_flat, 'sqeuclidean') #.T
print "R4\t",time.time()-t0, R.shape, #17.7022118568 (108225, 10500)
print numpy.max(R) #4176.26290975
# uses 9 Gb ram
return R
def getR5(VVm,VVs,HHm,HHs):
from scipy.spatial.distance import cdist
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
R=spdist.cdist(precomputed_flat,measured_flat, 'euclidean') #.T
print "R5\t",time.time()-t0, R.shape, #15.6070930958 (108225, 10500)
print numpy.max(R) #64.6240118667
# uses only 9 Gb ram
return R
def getR6(VVm,VVs,HHm,HHs):
from scipy.weave import blitz
t0=time.time()
R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis]
blitz("R=R*R") # R*=R
R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis]
blitz("R1=R1*R1") # R1*=R1
blitz("R=R+R1") # R+=R1
del R1
print "R6\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500)
print numpy.max(R) #4176.26290975
return R
导致以下时间:
R1 11.7737319469 (108225, 10500) 4909.66881791
R2 15.1279799938 (108225, 10500) 4909.66881791
R3 12.7408981323 (108225, 10500) 4909.66881791
R4 17.3336868286 (10500, 108225) 4909.66881791
R5 15.7530870438 (10500, 108225) 70.0690289494
R6 11.670968771 (108225, 10500) 4909.66881791
虽然最后一个给出sqrt((VVm-VVs)^ 2 +(HHm-HHs)^ 2),而其他给出(VVm-VVs)^ 2 +(HHm-HHs)^ 2,这不是非常重要,因为否则在我的代码中我会采用每个i的最小R [i,:],并且sqrt不会影响最小值,(如果我对距离感兴趣,我只需要sqrt(值) ),而不是在整个数组上执行sqrt,因此实际上没有时间差异。
问题依然存在:第一个解决方案怎么样才最好,(第二个和第三个解决方案速度较慢的原因是因为deltas = ...需要5.8秒,这也就是为什么这两个方法需要26Gb))为什么sqeuclidean比欧几里德慢?
sqeuclidean应该做(VVm-VVs)^ 2 +(HHm-HHs)^ 2,而我认为它做了不同的事情。任何人都知道如何找到该方法的源代码(C或底部的任何内容)?我认为它确实是sqrt((VVm-VVs)^ 2 +(HHm-HHs)^ 2)^ 2(唯一的原因我能想到为什么它会慢于(VVm-VVs)^ 2 +(HHm-HHs) ^ 2 - 我知道这是一个愚蠢的原因,任何人都有一个更合乎逻辑的原因?)
由于我对C一无所知,我如何使用scipy.weave内联?这个代码是否可以编译,就像你使用python一样?或者我需要安装特殊的东西吗?
编辑:好吧,我用scipy.weave.blitz(R6方法)试了一下,这稍快一点,但我认为有人知道比我更多的C还可以提高这个速度吗?我只是采用了形式为+ = b或* =的行,然后查看了它们在C中的含义,并将它们放在闪电战语句中,但我想如果我将行与flatten和newaxis放在一起C也是,它应该更快,但我不知道我怎么能这样做(知道C的人可能会解释?)。现在,闪电战和我的第一种方法之间的差异不足以真正由C vs numpy引起我猜?
我猜其他方法,比如deltas = ...也可以更快,当我把它放在C?
答案 0 :(得分:6)
每当您有乘法和求和时,请尝试使用其中一个点积函数或np.einsum
。由于您要预先分配数组,而不是为水平和垂直坐标设置不同的数组,因此将它们堆叠在一起:
precomputed_flat = np.column_stack((svf.flatten(), shf.flatten()))
measured_flat = np.column_stack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat - measured_flat[:, None, :]
从这里开始,最简单的是:
dist = np.einsum('ijk,ijk->ij', deltas, deltas)
您也可以尝试以下方式:
from numpy.core.umath_tests import inner1d
dist = inner1d(deltas, deltas)
当然还有SciPy的空间模块cdist
:
from scipy.spatial.distance import cdist
dist = cdist(precomputed_flat, measured_flat, 'euclidean')
修改强> 我无法对这么大的数据集进行测试,但这些时间非常有启发性:
len_a, len_b = 10000, 1000
a = np.random.rand(2, len_a)
b = np.random.rand(2, len_b)
c = np.random.rand(len_a, 2)
d = np.random.rand(len_b, 2)
In [3]: %timeit a[:, None, :] - b[..., None]
10 loops, best of 3: 76.7 ms per loop
In [4]: %timeit c[:, None, :] - d
1 loops, best of 3: 221 ms per loop
对于上面较小的数据集,我可以使用scipy.spatial.distance.cdist
略微提高您的方法,并通过在内存中以不同方式排列数据来匹配inner1d
:
precomputed_flat = np.vstack((svf.flatten(), shf.flatten()))
measured_flat = np.vstack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat[:, None, :] - measured_flat
import scipy.spatial.distance as spdist
from numpy.core.umath_tests import inner1d
In [13]: %timeit r0 = a[0, None, :] - b[0, :, None]; r1 = a[1, None, :] - b[1, :, None]; r0 *= r0; r1 *= r1; r0 += r1
10 loops, best of 3: 146 ms per loop
In [14]: %timeit deltas = (a[:, None, :] - b[..., None]).T; inner1d(deltas, deltas)
10 loops, best of 3: 145 ms per loop
In [15]: %timeit spdist.cdist(a.T, b.T)
10 loops, best of 3: 124 ms per loop
In [16]: %timeit deltas = a[:, None, :] - b[..., None]; np.einsum('ijk,ijk->jk', deltas, deltas)
10 loops, best of 3: 163 ms per loop