我在数据集中应用了scikit-learn的kmeans方法形式,之后尝试绘制数据和集群,但一直出错,不知道该怎么做。
这是我现在的代码:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
def construct_dict(list_keys, list_values):
res = {}
for i in range(len(list_keys)):
k=list_keys[i]
for v in set(list_values):
res[k]=res.get(k,{})
res[k][v]=res[k].get(v,0)
res[k][list_values[i]]+=1
return res
def print_result(matrix, assigs, y):
pred=list(set(y))
res = construct_dict(assigs,y)
print(res, "is a clustering obtained when K=", nclusts)
silhouette_avg = silhouette_score(matrix, assigs) #Compute the mean
Silhouette Coefficient of all samples
print("For %d clusters the average sillouette score is: %f" % (nclusts, silhouette_avg))
return res
def bcubed(assigs, y, calcule='precision'):
#calcule = precision ou recall
summatory = 0.0
n = len(y)
if calcule=='precision':
list_dicts = construct_dict(assigs,y).values()
elif calcule=='recall':
list_dicts = construct_dict(y,assigs).values()
for dic in list_dicts:
values_dict= dic.values()
n_elem_dic=sum(values_dict)
for value in values_dict:
summatory +=(value-1)*value/n_elem_dic
return summatory/n
def k_means(nclusts, matrix, y):
kmeans = KMeans(n_clusters=nclusts, random_state=0).fit(matrix)
assigs=list(kmeans.labels_)
res=print_result(matrix, assigs, y)
print('Precision BCubed:',bcubed(assigs,y))
print('Recall BCubed:',bcubed(assigs,y,'recall'))
return assigs
for nclusts in [2,3,4,5,10,20,30]:
k_means(nclusts, X_pca, y)
print("--------------")
输出:
{0:{'ALL':12,'AML':14},1:{'ALL':35,'AML':11}}是一个集群
当K = 2时获得对于2个群集,平均sillouette得分为:
0.147925 Precision BCubed:0.5602471200297287调用BCubed:0.5528841607565012
...
import pandas as pd
from matplotlib import pyplot as plt
pd.DataFrame(X_pca).T.plot()
plt.show()
输出:情节很好。
现在给我错误的部分:
import pandas as pd
k_means(5,X_pca,y)
pd.DataFrame(assigs).T.plot()
plt.show()
输出:
NameError:未定义名称“助手”
我在做什么错了?
答案 0 :(得分:1)
似乎您没有保存kmeans返回的内容。这行得通吗?
import pandas as pd
assigs = k_means(5,X_pca,y)
pd.DataFrame(assigs).T.plot()
plt.show()