模糊c均值聚类和评估方法

时间:2019-05-22 09:21:17

标签: python pandas cluster-analysis pca spike

我正在尝试对数据使用模糊c均值聚类。我只想显示群集n =2。我已经尝试过此代码,并且可以正常工作,但是如果我修改为仅打印群集2,则会遇到问题。

fig1, axes1 = plt.subplots(3, 3, figsize=(20, 10))
alldata = np.vstack((cluDF['Height'], cluDF['time_of_day'],cluDF['resolution']))
fpcs = []

colors = ['b', 'orange', 'g', 'r', 'c', 'm', 'y', 'k', 'Brown', 'ForestGreen'] 

for ncenters, ax in enumerate(axes1.reshape(-1), 2):
    cntr, u, u0, d, jm, p, fpc = fuzz.cluster.cmeans(
        alldata, ncenters, 3, error=0.005, maxiter=1000, init=None)
    fpcs.append(fpc)

    # Plot assigned clusters, for each data point in training set
    cluster_membership = np.argmax(u, axis=0)
    for j in range(ncenters):
        ax.plot(cluDF['time_of_day'][cluster_membership == j],
                cluDF['Height'][cluster_membership == j],
                #rincipalDf['principal component 2'][cluster_membership == j], 
                '.', 
                color=colors[j])

    ax.set_title('Centers = {0}; FPC = {1:.2f}'.format(ncenters, fpc))
    ax.axis('on')
    ax.grid()

这是我尝试过的,它返回空。

cluDF
    Height  time_of_day resolution
272 1.567925    1.375000    0.594089
562 1.807508    1.458333    0.594089
585 2.693542    0.416667    0.594089
658 1.542407    1.458333    0.594089
681 1.930844    0.416667    0.594089
802 1.505548    1.458333    0.594089

axes1 = plt.subplots(1, 1, figsize=(20, 10))
alldata = np.vstack((cluDF['Height'], cluDF['time_of_day'],cluDF['resolution']))
fpcs = []

colors = ['b', 'orange', 'g', 'r', 'c', 'm', 'y', 'k', 'Brown', 'ForestGreen'] 

ncenters = 2

cntr, u, u0, d, jm, p, fpc = fuzz.cluster.cmeans(\
    alldata, ncenters, 3, error=0.005, maxiter=1000, init=None)
fpcs.append(fpc)

# Plot assigned clusters, for each data point in training set
cluster_membership = np.argmax(u, axis=0)
for j in range(ncenters):
    ax.plot(cluDF['time_of_day'][cluster_membership == j],
            cluDF['Height'][cluster_membership == j],'.', 
            color=colors[j])


ax.set_title('Centers = {0}; FPC = {1:.2f}'.format(ncenters, fpc))
ax.axis('on')
ax.grid()

此外,我想请教一些关于距离较小的数据集适合哪种聚类方法的建议。

此数据集是关于峰值聚类和非监督数据的。我已经尝试过K-mean,分层聚集聚类链接= Single和Ward。这些结果没有显示出很大的差异,我想知道是否有任何评估方法。

0 个答案:

没有答案