目前,我正在编写一个简单的Python程序来进行k-medians聚类,但是我遇到了一个与变量作用域有关的问题。
这是我的聚类方法
class Cluster(object):
center = None
points = []
def __init__(self, center):
super(Cluster, self).__init__()
self.center = center
def manhattan(row_a, row_b):
dimensions = len(row_a)
manhattan_dist = 0
for i in range(0, dimensions):
manhattan_dist = manhattan_dist + np.abs(float(row_a[i]) - float(row_b[i]))
return manhattan_dist
def cluster(dataset, cluster_centers):
clusters = []
for cluster_center in cluster_centers:
clusters.append(Cluster(center = cluster_center))
for point in dataset:
last_dist = np.inf
last_cluster = None
for cluster in clusters:
dist = manhattan(point, cluster.center)
if(dist != 0):
if (dist < last_dist):
print str(dist) + " " + str(last_dist)
last_dist = dist
last_cluster = cluster
last_cluster.points.append(point)
return clusters
结果=簇([[1,1],[1,2],[1,3],[7,2],[8,3],[7,1]],[[2,2] ],[6,6]])
-
result = cluster([[1,1], [1,2], [1,3], [7,2], [8,3], [7,1]], [[2,2], [6,6]])
这是我得到的输出
问题在于,我将问题分配给变量&#34; last_dist&#34;并且可能&#34; last_cluster&#34;在for循环的簇内部,根据输出中可以看到的内容,值似乎根本没有更新,除了在返回到它之前它的值为7的单次迭代之外原始值&#34; Inf&#34;再次。这是什么原因,我该怎么办呢?谢谢
答案 0 :(得分:0)
您还期望发生什么?这是你的代码:
for point in dataset:
last_dist = np.inf # this line is executed 6 times
last_cluster = None
for cluster in clusters:
...
clusters
中只有2个项目,dataset
中只有6个项目。因此,对于每个点(6次),last_dist
以inf
开头。输出中有6个inf
,因此按预期工作。对于第二个群集,last_dist
仅在满足您的条件if (dist < last_dist)
时才会打印。看起来它只执行一次,这就是为什么你得到7.0
而不是inf
。也许你有manhattan()
的错误?
因为
答案 1 :(得分:0)
您的代码似乎没有任何问题。您正在尝试找到每个点最近的群集。您可能会感到困惑的原因是因为您在更改为左侧值之前在last_dist中打印这些inf ...