Question

k均值聚类算法的目标是发现：

$a$

我查看了它在python中的几种实现，其中一些规范不是平方的。

例如（取自here）：

baz.com

给出相反的预期实现（取自here；这只是距离计算）：

baz.com

现在，我知道有几种方法可以计算范数\距离，但是我只研究了将def form_clusters(labelled_data, unlabelled_centroids): """ given some data and centroids for the data, allocate each datapoint to its closest centroid. This forms clusters. """ # enumerate because centroids are arrays which are unhashable centroids_indices = range(len(unlabelled_centroids)) # initialize an empty list for each centroid. The list will # contain all the datapoints that are closer to that centroid # than to any other. That list is the cluster of that centroid. clusters = {c: [] for c in centroids_indices} for (label,Xi) in labelled_data: # for each datapoint, pick the closest centroid. smallest_distance = float("inf") for cj_index in centroids_indices: cj = unlabelled_centroids[cj_index] distance = np.linalg.norm(Xi - cj) if distance < smallest_distance: closest_centroid_index = cj_index smallest_distance = distance # allocate that datapoint to the cluster of that centroid. clusters[closest_centroid_index].append((label,Xi)) return clusters.values()与import numpy as np from numpy.linalg import norm def compute_distance(self, X, centroids): distance = np.zeros((X.shape[0], self.n_clusters)) for k in range(self.n_clusters): row_norm = norm(X - centroids[k, :], axis=1) distance[:, k] = np.square(row_norm) return distance或np.linalg.norm一起使用的实现，正如我所说，其中一些规范不是平方的，但是它们正确地聚类了。

为什么？

Answer 1

根据经验，将范数或平方范数用作优化算法的目标函数会得出相似的结果。目标函数的最小值将改变，但是获得的参数将相同。我总是猜测内部乘积生成二次函数，而乘积的根仅改变幅度，而不改变强制性函数拓扑。在这里可以找到更详细的答案。 https://math.stackexchange.com/questions/2253443/difference-between-least-squares-and-minimum-norm-solution 希望能帮助到你。

实现k-means聚类算法时是否应对np.linalg.norm求平方？

1 个答案: