在Scikit中绘制群集边界

时间:2017-03-19 18:58:34

标签: python matplotlib scikit-learn cluster-analysis contour

我在Scikit中使用K-means进行了聚类。然后,我根据Scikit example绘制了聚类区域。接下来,对于每个聚类,我再次进行聚类,并且我想在同一个图上显示子聚类的边界。我发现这个question很有意思,但是当我应用这个方法时,轴范围发生了变化,并出现了一个新的图。

已编辑:我的功能如下:

def plot_pca_clusters_races_match(pca_km, reduced_data, pca_data_winner,
                                  race1_pca_km, race1_reduced_data, race1_pca_data_winner, race1_nclusters,
                                  race2_pca_km, race2_reduced_data, race2_pca_data_winner, race2_nclusters,
                                  plt_opt, fig_path, race_approach, n_clusters):

    """
    :param pca_km: K-means trained by PCA data (2 components)
    :param reduced_data: PCA components
    :param data_winner: player_id, pca_component1, pca_component2, race_id, winner
    :param plt_opt: space required to plot cluster area
    :param fig_path: path to save the plot
    :param race_approach:
    :param n_clusters:
    :return:
    """

    race_id_list = ['Z', 'T', 'P']
    # 1- Plot cluster area
    x_min, x_max = reduced_data[:, 0].min() + plt_opt[0], reduced_data[:, 0].max() + plt_opt[1]
    y_min, y_max = reduced_data[:, 1].min() + plt_opt[2], reduced_data[:, 1].max() + plt_opt[3]
    step = abs((abs(x_max) - abs(x_min))) / 100
    xx, yy = np.meshgrid(np.arange(x_min, x_max, step), np.arange(y_min, y_max, step))
    Z = pca_km.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.figure(1)
    plt.clf()

    # Plot cluster regions
    plt.imshow(Z, interpolation='nearest',
               extent=(xx.min(), xx.max(), yy.min(), yy.max()),
               cmap=plt.cm.Paired,
               aspect='auto', origin='lower')

    # 2- Plot cluster members
    race_ids = list(set(pca_data_winner[:, -3]))

    # Find race type
    reduced_data_race1 = pca_data_winner[np.where(pca_data_winner[:, -3] == race_ids[0]), :][0]

    # Plot race 1
    plt.plot(reduced_data_race1[:, 2], reduced_data_race1[:, 3], 'k.', markersize=4, color='red',
             label=race_id_list[int(race_ids[0])])

    # Plot race 2
    # If the race is non-symmetric, change color of the cluster members
    if len(race_ids) > 1:
        reduced_data_race2 = pca_data_winner[np.where(pca_data_winner[:, -3] == race_ids[1]), :][0]
        plt.plot(reduced_data_race2[:, 2], reduced_data_race2[:, 3], 'k.', markersize=4, color='green',
                 label=race_id_list[int(race_ids[1])], hold=True)

    # 3-Plot cluster centers
    markers = ['d', 'v', 's', '*', 'h', 'p', 'o']
    for cluster in range(0, pca_km.cluster_centers_.shape[0]):
        plt.scatter(pca_km.cluster_centers_[cluster, 0], pca_km.cluster_centers_[cluster, 1],
                    marker=markers[cluster], s=80, linewidths=1,
                    label='Cluster ' + str(cluster),
                    color='b', zorder=4, hold=True)
        plt.xlabel('PC 1')
        plt.ylabel('PC 2')

    plt.legend(prop={'size':8})

    # --------------------------------------------- Plot boundaries of sub-clusters
   x1_min, x1_max = race1_reduced_data[:, 0].min() + plt_opt[0], race1_reduced_data[:, 0].max() + plt_opt[1]
   y1_min, y1_max = race1_reduced_data[:, 1].min() + plt_opt[2], race1_reduced_data[:, 1].max() + plt_opt[3]

   step = abs((abs(x_max) - abs(x_min))) / 100
   xx1, yy1 = np.meshgrid(np.arange(x1_min, x1_max, step), np.arange(y1_min, y1_max, step))

   Z1 = race1_pca_km.predict(np.c_[xx1.ravel(), yy1.ravel()])
   Z1 = Z1.reshape(xx1.shape)

   # Plot sub-cluster boundaries
   plt.contour(Z, extent=(xx.min(), xx.max(), yy.min(), yy.max()))

第一个情节: enter image description here

尝试添加countours和缩放后: enter image description here

1 个答案:

答案 0 :(得分:0)

没有轮廓的第一个绘图位于第二个绘图的左下角。这是因为轮廓没有给出适当的比例(在这种情况下,它将简单地扩展到Z阵列的行和列索引。

您需要提供轮廓范围

plt.contour(Z, extent=(..,..,..,..))

或指定一些X和Y数组来确定坐标。

plt.contour(X,Y,Z)