Question

我正在尝试使用Numpy在Python 3中实现k-means算法。我的输入数据矩阵是一个简单的n x 2点数据矩阵：

[[1, 2],
 [3, 4],
   ...
 [7, 13]]

由于某些原因，在迭代的每个步骤中，我的标签都不相同。每个标签都不同。有人看到我正在做的任何明显的错误吗？我试图在我的代码中添加一些注释，以便人们可以理解我正在做的各种步骤。

def kmeans(X,k):

    # Initialize by choosing k random data points as centroids
    num_features = X.shape[1]
    centroids = X[np.random.randint(X.shape[0], size=k), :] # find k centroids
    iterations = 0
    old_labels, labels = [], []

    while not should_stop(old_labels, labels, iterations):
        iterations += 1

        clusters = [[] for i in range(0,k)]
        for i in range(k):
            clusters[i].append(centroids[i])

        # Label points
        old_labels = labels
        labels = []
        for point in X:
            distances = [np.linalg.norm(point-centroid) for centroid in centroids]
            max_centroid = np.argmax(distances)
            labels.append(max_centroid)
            clusters[max_centroid].append(point)

        # Compute new centroids
        centroids = np.empty(shape=(0,num_features))
        for cluster in clusters:
            avgs = sum(cluster)/len(cluster)
            centroids = np.append(centroids, [avgs], axis=0)

    return labels

def should_stop(old_labels, labels, iterations):
    count = 0
    if len(old_labels) == 0:
        return False
    for i in range(len(labels)):
        count += (old_labels[i] != labels[i])
    print(count)
    if old_labels == labels or iterations == 2000:
        return True
    return False

Answer 1

max_centroid = np.argmax(distances)

您希望找到最小化距离的质心，而不是最大化距离的质心。

k-means算法不起作用

1 个答案: