这能矢量(numpy的)?

时间:2019-02-02 20:54:20

标签: python numpy vectorization

我有一个特征向量列表,并希望计算特征向量到所有其他特征向量的L2距离,作为唯一性度量。在这里,min_distances[i]给出了第i个特征向量的L2范数。

import numpy as np

# Generate data
nrows = 2000
feature_length = 128
feature_vecs = np.random.rand(nrows, feature_length)

# Calculate min L2 norm from each feature vector
# to all other feature vectors
min_distances = np.zeros(nrows)
indices = np.arange(nrows)
for i in indices:
    min_distances[i] = np.min(np.linalg.norm(
        feature_vecs[i != indices] - feature_vecs[i],
        axis=1))

成为O(n ^ 2)时,它的速度很慢,并且希望对其进行优化。我可以摆脱for循环/对其向量化,使得minlinalg.norm仅被调用一次吗?

1 个答案:

答案 0 :(得分:2)

方法1

这里是cdist-

from scipy.spatial.distance import cdist,pdist,squareform

d = squareform(pdist(feature_vecs))
np.fill_diagonal(d,np.nan)
min_distances = np.nanmin(d,axis=0)

方法2

另一个与cKDTree-

from scipy.spatial import cKDTree

min_distances = cKDTree(feature_vecs).query(feature_vecs, k=2)[0][:,1]