Question

我有一个矩阵和矩阵中每行的质心向量。我想将矩阵中的每个元素与向量中的每个元素进行比较，并找出质心向量中哪个元素最接近矩阵中的数据点。有没有办法在不使用循环的情况下执行此操作？我将使用大量数据并希望它尽可能快

这是python中一个非常简单的例子，我现在正在这样做：

import scipy as sp
test_array = sp.array([(1,1,1),(3,4,5),(6,12,18)])
sumx = test_array.sum(axis=1)
centroid_vector = sumx / len(test[0])
for i in centroid_vector:
    x = abs(test_array - i)
    minimum = sp.argmin(x)

期望的结果是具有最小距离的矩阵，原始值（来自test_array）以及距离最小的质心向量中的元素的索引。在这种情况下，它看起来像这样：

[(0, 1, 1), 
 (0, 1, 1), 
 (0, 1, 1),
 (1, 3, 2), 
 (0, 4, 2),
 ...
 (6, 18, 3)]

Answer 1

以下是您问题的一个解决方案：

import scipy as sp
test_array = sp.array([(1,1,1),(3,4,5),(6,12,18)])
# Create the centroid another way but yours is fine
centroid_vector = test_array.sum(axis=1)/test_array.shape[1]
# Generate an array with all the difference between 
# each element of test_array (row) and centroid_vector (column)
delta_array = abs(test_array.reshape((9,1)) - centroid_vector)
# Finally, the first column of your output is delta_array.min(axis=1), 
# the second is test_array.reshape((9,1))
# and the third is delta_array.argmin(axis=1)
# so you can do:
array_output = sp.array([delta_array.min(axis=1),test_array.reshape((9)),
                         delta_array.argmin(axis=1)]).transpose()

注意：centroid_vector中的元素索引从0（python约定）开始，而不是问题中的1，但如果你想要1,2或3，只需要delta_array.argmin(axis=1) +1第三栏。

注意2：避免使用sum作为变量的名称，它是一个内置函数，在您的代码中可能会带来一些麻烦。

比较两个numpy数组中的所有元素而不使用循环

1 个答案: