我有两个坐标列表:
l1 = [[x,y,z],[x,y,z],[x,y,z],[x,y,z],[x,y,z]]
l2 = [[x,y,z],[x,y,z],[x,y,z]]
我想找到l1和l2之间的最短成对距离。两个坐标之间的距离很简单:
numpy.linalg.norm(l1_element - l2_element)
那么我如何使用numpy将这个操作有效地应用于每对元素呢?
答案 0 :(得分:4)
以下是迄今为止提出的四种方法的快速性能分析:
import numpy
import scipy
from itertools import product
from scipy.spatial.distance import cdist
from scipy.spatial import cKDTree as KDTree
n = 100
l1 = numpy.random.randint(0, 100, size=(n,3))
l2 = numpy.random.randint(0, 100, size=(n,3))
# by @Phillip
def a(l1,l2):
return min(numpy.linalg.norm(l1_element - l2_element) for l1_element,l2_element in product(l1,l2))
# by @Kasra
def b(l1,l2):
return numpy.min(numpy.apply_along_axis(
numpy.linalg.norm,
2,
l1[:, None, :] - l2[None, :, :]
))
# mine
def c(l1,l2):
return numpy.min(scipy.spatial.distance.cdist(l1,l2))
# just checking that numpy.min is indeed faster.
def c2(l1,l2):
return min(scipy.spatial.distance.cdist(l1,l2).reshape(-1))
# by @BrianLarsen
def d(l1,l2):
# make KDTrees for both sets of points
t1 = KDTree(l1)
t2 = KDTree(l2)
# we need a distance to not look beyond, if you have real knowledge use it, otherwise guess
maxD = numpy.linalg.norm(l1[0] - l2[0]) # this could be closest but anyhting further is certainly not
# get a sparce matrix of all the distances
ans = t1.sparse_distance_matrix(t2, maxD)
# get the minimum distance and points involved
minD = min(ans.values())
return minD
for x in (a,b,c,c2,d):
print("Timing variant", x.__name__, ':', flush=True)
print(x(l1,l2), flush=True)
%timeit x(l1,l2)
print(flush=True)
n=100
Timing variant a :
2.2360679775
10 loops, best of 3: 90.3 ms per loop
Timing variant b :
2.2360679775
10 loops, best of 3: 151 ms per loop
Timing variant c :
2.2360679775
10000 loops, best of 3: 136 µs per loop
Timing variant c2 :
2.2360679775
1000 loops, best of 3: 844 µs per loop
Timing variant d :
2.2360679775
100 loops, best of 3: 3.62 ms per loop
n=1000
Timing variant a :
0.0
1 loops, best of 3: 9.16 s per loop
Timing variant b :
0.0
1 loops, best of 3: 14.9 s per loop
Timing variant c :
0.0
100 loops, best of 3: 11 ms per loop
Timing variant c2 :
0.0
10 loops, best of 3: 80.3 ms per loop
Timing variant d :
0.0
1 loops, best of 3: 933 ms per loop
答案 1 :(得分:1)
使用newaxis和broadcasting,l1[:, None, :] - l2[None, :, :]
是成对差异向量的数组。您可以使用apply_along_axis
将此数组缩减为norm
个数组,然后使用min
:
numpy.min(numpy.apply_along_axis(
numpy.linalg.norm,
2,
l1[:, None, :] - l2[None, :, :]
))
当然,这只适用于l1
和l2
是numpy数组的情况,因此如果问题中的列表不是伪代码,则必须添加l1 = numpy.array(l1); l2 = numpy.array(l2)
答案 2 :(得分:0)
您可以使用itertools.product
获取使用min
的所有组合:
l1 = [[x,y,z],[x,y,z],[x,y,z],[x,y,z],[x,y,z]]
l2 = [[x,y,z],[x,y,z],[x,y,z]]
from itertools import product
min(numpy.linalg.norm(l1_element - l2_element) for l1_element,l2_element in product(l1,l2))
答案 3 :(得分:0)
如果你有很多很多点,这对KDTree很有用。对于这个例子来说完全矫枉过正,但是对于某类问题来说是一个很好的学习经验并且非常快,并且可以提供一定距离内的点数的更多信息。
import numpy as np
from scipy.spatial import cKDTree as KDTree
#sample data
l1 = [[0,0,0], [4,5,6], [7,6,7], [4,5,6]]
l2 = [[100,3,4], [1,0,0], [10,15,16], [17,16,17], [14,15,16], [-34, 5, 6]]
# make them arrays
l1 = np.asarray(l1)
l2 = np.asarray(l2)
# make KDTrees for both sets of points
t1 = KDTree(l1)
t2 = KDTree(l2)
# we need a distance to not look beyond, if you have real knowledge use it, otherwise guess
maxD = np.linalg.norm(l1[-1] - l2[-1]) # this could be closest but anyhting further is certainly not
# get a sparce matrix of all the distances
ans = t1.sparse_distance_matrix(t2, maxD)
# get the minimum distance and points involved
minA = min([(i,k) for k, i in ans.iteritems()])
print("Minimun distance is {0} between l1={1} and l2={2}".format(minA[0], l1[minA[1][0]], l2[minA[1][2]] ))
这样做是为点集合制作KDTree,然后在猜测距离内找到点的所有距离,并返回距离和最近点。 This post有关于KDTree如何工作的文章。