实现k-means聚类算法时是否应对np.linalg.norm求平方?

时间:2019-04-13 23:23:09

标签: python k-means

k均值聚类算法的目标是发现:

a

我查看了它在python中的几种实现,其中一些规范不是平方的。

例如(取自here):

baz.com

给出相反的预期实现(取自here;这只是距离计算):

baz.com

现在,我知道有几种方法可以计算范数\距离,但是我只研究了将def form_clusters(labelled_data, unlabelled_centroids): """ given some data and centroids for the data, allocate each datapoint to its closest centroid. This forms clusters. """ # enumerate because centroids are arrays which are unhashable centroids_indices = range(len(unlabelled_centroids)) # initialize an empty list for each centroid. The list will # contain all the datapoints that are closer to that centroid # than to any other. That list is the cluster of that centroid. clusters = {c: [] for c in centroids_indices} for (label,Xi) in labelled_data: # for each datapoint, pick the closest centroid. smallest_distance = float("inf") for cj_index in centroids_indices: cj = unlabelled_centroids[cj_index] distance = np.linalg.norm(Xi - cj) if distance < smallest_distance: closest_centroid_index = cj_index smallest_distance = distance # allocate that datapoint to the cluster of that centroid. clusters[closest_centroid_index].append((label,Xi)) return clusters.values() import numpy as np from numpy.linalg import norm def compute_distance(self, X, centroids): distance = np.zeros((X.shape[0], self.n_clusters)) for k in range(self.n_clusters): row_norm = norm(X - centroids[k, :], axis=1) distance[:, k] = np.square(row_norm) return distance np.linalg.norm一起使用的实现,正如我所说,其中一些规范不是平方的,但是它们正确地聚类了。

为什么?

1 个答案:

答案 0 :(得分:0)

根据经验,将范数或平方范数用作优化算法的目标函数会得出相似的结果。目标函数的最小值将改变,但是获得的参数将相同。我总是猜测内部乘积生成二次函数,而乘积的根仅改变幅度,而不改变强制性函数拓扑。在这里可以找到更详细的答案。 https://math.stackexchange.com/questions/2253443/difference-between-least-squares-and-minimum-norm-solution 希望能帮助到你。