k均值聚类算法的目标是发现:
我查看了它在python中的几种实现,其中一些规范不是平方的。
例如(取自here):
baz.com
给出相反的预期实现(取自here;这只是距离计算):
baz.com
现在,我知道有几种方法可以计算范数\距离,但是我只研究了将def form_clusters(labelled_data, unlabelled_centroids):
"""
given some data and centroids for the data, allocate each
datapoint to its closest centroid. This forms clusters.
"""
# enumerate because centroids are arrays which are unhashable
centroids_indices = range(len(unlabelled_centroids))
# initialize an empty list for each centroid. The list will
# contain all the datapoints that are closer to that centroid
# than to any other. That list is the cluster of that centroid.
clusters = {c: [] for c in centroids_indices}
for (label,Xi) in labelled_data:
# for each datapoint, pick the closest centroid.
smallest_distance = float("inf")
for cj_index in centroids_indices:
cj = unlabelled_centroids[cj_index]
distance = np.linalg.norm(Xi - cj)
if distance < smallest_distance:
closest_centroid_index = cj_index
smallest_distance = distance
# allocate that datapoint to the cluster of that centroid.
clusters[closest_centroid_index].append((label,Xi))
return clusters.values()
与import numpy as np
from numpy.linalg import norm
def compute_distance(self, X, centroids):
distance = np.zeros((X.shape[0], self.n_clusters))
for k in range(self.n_clusters):
row_norm = norm(X - centroids[k, :], axis=1)
distance[:, k] = np.square(row_norm)
return distance
或np.linalg.norm
一起使用的实现,正如我所说,其中一些规范不是平方的,但是它们正确地聚类了。
为什么?
答案 0 :(得分:0)
根据经验,将范数或平方范数用作优化算法的目标函数会得出相似的结果。目标函数的最小值将改变,但是获得的参数将相同。我总是猜测内部乘积生成二次函数,而乘积的根仅改变幅度,而不改变强制性函数拓扑。在这里可以找到更详细的答案。 https://math.stackexchange.com/questions/2253443/difference-between-least-squares-and-minimum-norm-solution 希望能帮助到你。