我正在尝试使用Numpy在Python 3中实现k-means算法。我的输入数据矩阵是一个简单的n x 2点数据矩阵:
[[1, 2],
[3, 4],
...
[7, 13]]
由于某些原因,在迭代的每个步骤中,我的标签都不相同。每个标签都不同。有人看到我正在做的任何明显的错误吗?我试图在我的代码中添加一些注释,以便人们可以理解我正在做的各种步骤。
def kmeans(X,k):
# Initialize by choosing k random data points as centroids
num_features = X.shape[1]
centroids = X[np.random.randint(X.shape[0], size=k), :] # find k centroids
iterations = 0
old_labels, labels = [], []
while not should_stop(old_labels, labels, iterations):
iterations += 1
clusters = [[] for i in range(0,k)]
for i in range(k):
clusters[i].append(centroids[i])
# Label points
old_labels = labels
labels = []
for point in X:
distances = [np.linalg.norm(point-centroid) for centroid in centroids]
max_centroid = np.argmax(distances)
labels.append(max_centroid)
clusters[max_centroid].append(point)
# Compute new centroids
centroids = np.empty(shape=(0,num_features))
for cluster in clusters:
avgs = sum(cluster)/len(cluster)
centroids = np.append(centroids, [avgs], axis=0)
return labels
def should_stop(old_labels, labels, iterations):
count = 0
if len(old_labels) == 0:
return False
for i in range(len(labels)):
count += (old_labels[i] != labels[i])
print(count)
if old_labels == labels or iterations == 2000:
return True
return False
答案 0 :(得分:1)
max_centroid = np.argmax(distances)
您希望找到最小化距离的质心,而不是最大化距离的质心。