我有一个python程序在一组数据上运行K-Means算法。这是一个家庭作业。它必须在不使用sclearn
的内置kmeans函数的情况下构建。
基本上,我为qty
设置了一个值,用于设置质心/群集的数量。然后它创建随机的x和y点作为我的集群。有时它运行没有错误,有时它给我这个:
Warning (from warnings module):
File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 59
warnings.warn("Mean of empty slice.", RuntimeWarning)
RuntimeWarning: Mean of empty slice.
Warning (from warnings module):
File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 68
ret, rcount, out=ret, casting='unsafe', subok=False)
RuntimeWarning: invalid value encountered in true_divide
这是我的代码:
import numpy as np
from pprint import pprint
import random
import sys
dataPoints = np.array([[2,4],[17,4],[45,2],[45,7],[16,32],[32,14],[20,56],[68,33],[54,36],[3,54],[23,5],[56,23],[10,81],[64,15],[23,18],[22,15],[35,19],[66,19],[1,99]])
rangeX = (0, 100)
rangeY = (0, 100)
qty = 5
randomCentroids = []
i = 0
while i<qty:
x = random.randrange(*rangeX)
y = random.randrange(*rangeY)
randomCentroids.append((x,y))
i += 1
centroids = np.asarray(randomCentroids)
def size(vector):
return np.sqrt(sum(x**2 for x in vector))
def distance(vector1, vector2):
return size(vector1 - vector2)
def distances(array1, array2):
ConvergenceCounter = 1
keepGoing = True
StartingCentroids = np.copy(centroids)
while keepGoing:
#--------------Find The new means---------#
np.linalg.norm(StartingCentroids[None, :, :] - dataPoints[:, None, :], axis=-1)
t0 = StartingCentroids[None, :, :] - dataPoints[:, None, :]
t1 = np.linalg.norm(t0, axis=-1)
t2 = np.argmin(t1, axis=-1)
cat = np.mean(dataPoints[t2 == 0], axis=0)
#------Push the new means to a new array for comparison---------#
CentroidMeans = []
for x in xrange(len(StartingCentroids)):
CentroidMeans.append(np.mean(dataPoints[t2 == [x]], axis=0))
#--------Convert to a numpy array--------#
NewMeans = np.asarray(CentroidMeans)
#------Compare the New Means with the Starting Means------#
if np.array_equal(NewMeans,StartingCentroids):
print ('Convergence has been reached after {} moves'.format(ConvergenceCounter))
print ('Starting Centroids:\n{}'.format(centroids))
print ('Final Means:\n{}'.format(NewMeans))
print ('Final Cluster assignments: {}'.format(t2))
for x in xrange(len(StartingCentroids)):
print ('Cluster {}:\n'.format(x)), dataPoints[t2 == [x]]
for x in xrange(len(StartingCentroids)):
print ('Size of Cluster {}:'.format(x)), len(dataPoints[t2 == [x]])
keepGoing = False
else:
ConvergenceCounter = ConvergenceCounter +1
StartingCentroids =np.copy(NewMeans)
distances(centroids,dataPoints)
对qty
变量使用较小的数字似乎没有错误。