我正在尝试对数据使用模糊c均值聚类。我只想显示群集n =2。我已经尝试过此代码,并且可以正常工作,但是如果我修改为仅打印群集2,则会遇到问题。
fig1, axes1 = plt.subplots(3, 3, figsize=(20, 10))
alldata = np.vstack((cluDF['Height'], cluDF['time_of_day'],cluDF['resolution']))
fpcs = []
colors = ['b', 'orange', 'g', 'r', 'c', 'm', 'y', 'k', 'Brown', 'ForestGreen']
for ncenters, ax in enumerate(axes1.reshape(-1), 2):
cntr, u, u0, d, jm, p, fpc = fuzz.cluster.cmeans(
alldata, ncenters, 3, error=0.005, maxiter=1000, init=None)
fpcs.append(fpc)
# Plot assigned clusters, for each data point in training set
cluster_membership = np.argmax(u, axis=0)
for j in range(ncenters):
ax.plot(cluDF['time_of_day'][cluster_membership == j],
cluDF['Height'][cluster_membership == j],
#rincipalDf['principal component 2'][cluster_membership == j],
'.',
color=colors[j])
ax.set_title('Centers = {0}; FPC = {1:.2f}'.format(ncenters, fpc))
ax.axis('on')
ax.grid()
这是我尝试过的,它返回空。
cluDF
Height time_of_day resolution
272 1.567925 1.375000 0.594089
562 1.807508 1.458333 0.594089
585 2.693542 0.416667 0.594089
658 1.542407 1.458333 0.594089
681 1.930844 0.416667 0.594089
802 1.505548 1.458333 0.594089
axes1 = plt.subplots(1, 1, figsize=(20, 10))
alldata = np.vstack((cluDF['Height'], cluDF['time_of_day'],cluDF['resolution']))
fpcs = []
colors = ['b', 'orange', 'g', 'r', 'c', 'm', 'y', 'k', 'Brown', 'ForestGreen']
ncenters = 2
cntr, u, u0, d, jm, p, fpc = fuzz.cluster.cmeans(\
alldata, ncenters, 3, error=0.005, maxiter=1000, init=None)
fpcs.append(fpc)
# Plot assigned clusters, for each data point in training set
cluster_membership = np.argmax(u, axis=0)
for j in range(ncenters):
ax.plot(cluDF['time_of_day'][cluster_membership == j],
cluDF['Height'][cluster_membership == j],'.',
color=colors[j])
ax.set_title('Centers = {0}; FPC = {1:.2f}'.format(ncenters, fpc))
ax.axis('on')
ax.grid()
此外,我想请教一些关于距离较小的数据集适合哪种聚类方法的建议。
此数据集是关于峰值聚类和非监督数据的。我已经尝试过K-mean,分层聚集聚类链接= Single和Ward。这些结果没有显示出很大的差异,我想知道是否有任何评估方法。