假设我有两套要点:
>>> points1.shape
(10000, 3)
>>> points2.shape
(1529, 3)
我想在points1
中某点的欧氏距离cutoff
内找到points2
的索引列表。我可以使用scipy.spatial.distance.cdist
轻松完成此操作:
from scipy.spatial.distance import cdist
import numpy
indices = numpy.argwhere(cdist(points1, points2).min(axis=0) < cutoff)
然而,这似乎效率低下,因为我不需要知道彼此之间有多远,只要它们是否在一个截止距离内。 KDTree可以帮助解决这个问题吗?
答案 0 :(得分:1)
以下是3个替代方案,一个使用cdist,两个使用scipy.spatial.cKDTree:
import itertools as IT
import numpy as np
import scipy.spatial as spatial
import scipy.spatial.distance as dist
np.random.seed(2016)
points1 = np.random.randint(100, size=(10**5, 3))
points2 = np.random.randint(100, size=(1529, 3))
cutoff = 5
def using_cdist(points1, points2, cutoff):
indices = np.where(dist.cdist(points1, points2) <= cutoff)[0]
indices = np.unique(indices)
return indices
def using_kdtree(points1, points2, cutoff):
# build the KDTree using the *smaller* points array
tree = spatial.cKDTree(points2)
groups = tree.query_ball_point(points1, cutoff)
indices = np.unique([i for i, grp in enumerate(groups) if len(grp)])
return indices
def using_kdtree2(points1, points2, cutoff):
# build the KDTree using the *larger* points array
tree = spatial.cKDTree(points1)
groups = tree.query_ball_point(points2, cutoff)
indices = np.unique(IT.chain.from_iterable(groups))
return indices
cdist_result = using_cdist(points1, points2, cutoff)
kdtree_result = using_kdtree(points1, points2, cutoff)
kdtree_result2 = using_kdtree2(points1, points2, cutoff)
assert np.allclose(cdist_result, kdtree_result)
assert np.allclose(cdist_result, kdtree_result2)
在这3个备选方案中,using_kdtree2
是最快的:
In [80]: %timeit using_kdtree3(points1, points2, cutoff)
10 loops, best of 3: 92.4 ms per loop
In [103]: %timeit using_kdtree(points1, points2, cutoff)
1 loops, best of 3: 938 ms per loop
In [104]: %timeit using_cdist(points1, points2, cutoff)
1 loops, best of 3: 1.51 s per loop
我对最快速度的直觉证明是完全错误的。一世
以为使用较小的点阵列构建KDTree会是
最快的。即使使用更大的点阵列构建KDTree也是如此
有点慢,在较小的点数组上调用tree.query_ball_point
更快:
In [68]: %timeit tree = spatial.cKDTree(points2)
1000 loops, best of 3: 312 µs per loop
In [69]: %timeit tree = spatial.cKDTree(points1)
10 loops, best of 3: 45.7 ms per loop
In [66]: %timeit tree = spatial.cKDTree(points2); groups = tree.query_ball_point(points1, cutoff)
1 loops, best of 3: 933 ms per loop
In [67]: %timeit tree = spatial.cKDTree(points1); groups = tree.query_ball_point(points2, cutoff)
10 loops, best of 3: 89.3 ms per loop
请注意,使用
时存在一些问题def orig(points1, points2, cutoff):
return np.argwhere(dist.cdist(points1, points2).min(axis=0) < cutoff)
首先,通过调用min(axis=0)
,如果points1
中有两个点,则会丢失信息
都在cutoff
中的points2
点内min
。你只会获得索引
最接近的一点。另一个问题是通过调用points2
0轴,剩下的全部是1轴,与orig
相关联。所以
points2
将索引返回points1
,而不是library(tidyr)
library(dplyr)
# reshape data to long format
td <- d %>% gather(key, value, value1:value4)
# create a copy w/ different names for merging
td2 <- td %>% select(id2 = id, key, value2 = value)
# full outer join to produce one row per pair of IDs
dd <- merge(td, td2, by = "key", all = TRUE)
# the result
dd %>%
filter(id != id2) %>%
group_by(id, id2) %>%
summarise(all_less = !any(value >= value2)) %>%
filter(all_less)
。
答案 1 :(得分:0)
一些想法(?):
如果您不需要知道距离,可以通过比较距离的平方与阈值的平方来保存平方根的计算(mServiceRestartEmitter.take(1).subscribe(action);
将计算平方根)。
第一次排除x坐标已经超过阈值的点,然后与y和z相同,将节省一些计算。特别是因为&#39;或&#39;在Python中是懒惰的(如果x已经足够远,它甚至不会检查y)。