假设我有以下numpy矩阵(简化):
matrix = np.array([[1, 1],
[2, 2],
[5, 5],
[6, 6]]
)
现在我想从最接近“搜索”向量的矩阵中获取向量:
search_vec = np.array([3, 3])
我要做的是以下事情:
min_dist = None
result_vec = None
for ref_vec in matrix:
distance = np.linalg.norm(search_vec-ref_vec)
distance = abs(distance)
print(ref_vec, distance)
if min_dist == None or min_dist > distance:
min_dist = distance
result_vec = ref_vec
结果有效,但是是否有本机numpy解决方案来提高效率? 我的问题是,矩阵越大,整个过程就越慢。 还有其他解决方案可以更优雅,更有效地解决这些问题吗?
答案 0 :(得分:3)
方法1
我们可以使用Cython-powered kd-tree
for quick nearest-neighbor lookup,它在内存和性能上都非常有效-
In [276]: from scipy.spatial import cKDTree
In [277]: matrix[cKDTree(matrix).query(search_vec, k=1)[1]]
Out[277]: array([2, 2])
方法2
In [286]: from scipy.spatial.distance import cdist
In [287]: matrix[cdist(matrix, np.atleast_2d(search_vec)).argmin()]
Out[287]: array([2, 2])
方法3
使用Scikit-learn's
Nearest Neighbors-
from sklearn.neighbors import NearestNeighbors
nbrs = NearestNeighbors(n_neighbors=1).fit(matrix)
closest_vec = matrix[nbrs.kneighbors(np.atleast_2d(search_vec))[1][0,0]]
方法4
from sklearn.neighbors import KDTree
kdt = KDTree(matrix, metric='euclidean')
cv = matrix[kdt.query(np.atleast_2d(search_vec), k=1, return_distance=False)[0,0]]
方法5
从eucl_dist
包中(免责声明:我是它的作者),在wiki contents
之后,我们可以利用matrix-multiplication
-
M = matrix.dot(search_vec)
d = np.einsum('ij,ij->i',matrix,matrix) + np.inner(search_vec,search_vec) -2*M
closest_vec = matrix[d.argmin()]