我有几点,并希望确定它们是否在彼此的特定距离内。如果是,我想将它们合并为一个点。我构建了一个搜索树,并从中获得了一个距离矩阵。是否存在优雅(如果可能的话没有慢速循环)方法来确定哪些点在特定距离内而不使用某些复杂的聚类算法(kmeans,层次结构等)?
import numpy as np
from sklearn.neighbors import NearestNeighbors
from sklearn.neighbors import radius_neighbors_graph
RADIUS = 0.025
points = np.array([
[13.2043373032, 52.3818529896],
[13.0530692845, 52.3991668707],
[13.229309674, 52.3840231],
[13.489018081, 52.4180538095],
[13.3209738098, 52.6375963146],
[13.0160362703, 52.4187139243],
[13.0448485, 52.4143229343],
[13.32478977, 52.5090253],
[13.35514839, 52.5219323867],
[13.1982523828, 52.3592620828]
])
tree = NearestNeighbors(n_neighbors=2, radius=RADIUS, leaf_size=30, algorithm="auto", n_jobs=1).fit(points)
nnGraph = radius_neighbors_graph(tree, RADIUS, mode='distance', include_self=False)
print nnGraph
(0, 9) 0.0233960536484
(1, 6) 0.0172420289306
(6, 1) 0.0172420289306
(9, 0) 0.0233960536484
答案 0 :(得分:1)
您可以使用pdist
中的squareform
和scipy.spatial.distance
作为矢量化解决方案,就像这样 -
from scipy.spatial.distance import pdist, squareform
# Get pairwise euclidean distances
dists = squareform(pdist(points))
# Get valid distances mask and the corresponding indices
mask = dists < RADIUS
np.fill_diagonal(mask,0)
idx = np.argwhere(mask)
# Present indices and corresponding distances as zipped output
out = zip(map(tuple,idx),dists[idx[:,0],idx[:,1]])
示例运行 -
In [91]: RADIUS
Out[91]: 0.025
In [92]: points
Out[92]:
array([[ 13.2043373 , 52.38185299],
[ 13.05306928, 52.39916687],
[ 13.22930967, 52.3840231 ],
[ 13.48901808, 52.41805381],
[ 13.32097381, 52.63759631],
[ 13.01603627, 52.41871392],
[ 13.0448485 , 52.41432293],
[ 13.32478977, 52.5090253 ],
[ 13.35514839, 52.52193239],
[ 13.19825238, 52.35926208]])
In [93]: out
Out[93]:
[((0, 9), 0.023396053648436933),
((1, 6), 0.017242028930573985),
((6, 1), 0.017242028930573985),
((9, 0), 0.023396053648436933)]
答案 1 :(得分:0)
对于小点数(&lt; 50),使用复数更快一点。在另一篇文章中找到了这个:Efficiently Calculating a Euclidean Distance Matrix Using Numpy
pointsCmplx = np.array([points[...,0] + 1j * points[...,1]])
dists = abs(pointsCmplx.T - pointsCmplx)
我的目标是在半径方面获得非重叠点。我拿了你的代码并删除了下三角矩阵,最后我简单地删除了第二点。这些点按特定观察进行排序。较低的指数意味着更重要。有效合并近集群而不是删除点的任何其他建议?我寻找一个非常快速的解决方案,不想使用一些复杂的聚类算法。
# overlapping points
points = np.array([
[13.2043373032, 52.3818529896],
[13.0530692845, 52.3991668707],
[13.229309674, 52.3840231],
[13.489018081, 52.4180538095],
[13.3209738098, 52.6375963146],
[13.0160362703, 52.4187139243],
[13.0448485, 52.4143229343],
[13.32478977, 52.5090253],
[13.35514839, 52.5219323867],
[13.1982523828, 52.3592620828],
[13.1982523828, 52.3592620830] # nearly identical
])
dists = squareform(pdist(points))
mask = dists < RADIUS
np.fill_diagonal(mask,0)
# delete lower triangular matrix
mask = np.triu(mask)
idx = np.argwhere(mask)
# delete the target ids
idx = idx[:,1]
points = np.delete(points, idx, 0)