为什么在使用相同数据时K-Medoids给出不同的聚类?

时间:2020-02-05 14:03:44

标签: python pandas numpy cluster-analysis

我要做的是根据坐标之间的实际距离(以米为单位)创建3个坐标簇。我正在使用pyclustering (v 0.9.3.1)库通过K-Medoids算法创建聚类。我之所以使用它,是因为它允许基于自定义距离矩阵进行聚类。

我有以下数据(代码中= df

[[ 5.54234025 52.33171979] [ 5.63723058 52.30203804] [ 6.86358932 52.71470154] [ 5.65058518 52.56547845] [ 5.69671853 52.80095967] [ 5.78317201 52.3762015 ] [ 6.49274668 52.73134323] [ 6.8586198  52.83012204] [ 6.5078194  52.72374062] [ 5.7740027  52.36877315] [ 6.30996811 52.38282932] [ 6.15836758 52.73475669] [ 6.03619448 52.37375389] [ 6.17387631 52.45866285] [ 5.91738025 52.55758756] [ 6.34791259 52.3186396 ] [ 6.17001243 52.68707341] [ 6.27778913 52.23139991] [ 6.1405875  52.89187446] [ 5.63316393 52.33808183] [ 6.14512217 52.28988876] [ 6.95356307 52.77812292] [ 5.58405455 52.27015314] [ 6.15267544 52.2555669 ] [ 5.49667521 52.35360582] [ 6.20266605 52.70916272] [ 6.1387325  52.48840386] [ 6.09516641 52.51060003] [ 6.05012155 52.52538424] [ 5.70644521 52.54255734] [ 5.65057224 52.3641085 ] [ 6.08146155 52.4996728 ] [ 6.20260632 52.26755503] [ 6.05072828 52.51110921] [ 6.93545227 52.75212102] [ 6.15780868 52.25478387] [ 5.75571905 52.71123786] [ 5.52027589 52.3400946 ] [ 6.21963138 52.24181099] [ 6.16916039 52.24674126] [ 5.86227855 52.70830936] [ 6.93279755 52.75671467] [ 6.17937646 52.85055622] [ 6.27402919 52.38300518] [ 5.62067086 52.2992954 ] [ 6.41170455 52.52922947] [ 5.96875572 52.68304847] [ 5.7822089  52.37554773] [ 6.11871619 52.78736115] [ 6.63561409 52.56128167] [ 6.42255129 52.51588309] [ 6.92364237 52.77151984] [ 6.73894674 52.6465025 ] [ 5.64660587 52.358935  ] [ 6.84892799 52.70955256] [ 6.1904102  52.69754784] [ 6.89467815 52.78900327] [ 6.08559412 52.22708975] [ 6.55571008 52.62638016] [ 6.09647307 52.51267545] [ 6.05069507 52.51462529] [ 6.12719784 52.49278953] [ 5.91781415 52.55528909] [ 5.54622509 52.32510519] [ 6.27398416 52.68559986] [ 5.60932195 52.32223254] [ 5.50159906 52.35178719]]

this是距离矩阵(代码中= matrix)的样子。

这是我在做什么:

df = create_data_set(data)

df_size = len(df)
initial_medoids = np.random.randint(df_size, size=number_of_clusters)

matrix = get_distance_matrix(data)


# create K-Medoids algorithm for processing distance matrix instead of points
kmedoids_instance = kmedoids(matrix, initial_medoids, data_type='distance_matrix')

# run cluster analysis and obtain results
kmedoids_instance.process()
clusters = kmedoids_instance.get_clusters()
medoids = kmedoids_instance.get_medoids()

# Display clusters.
visualizer = cluster_visualizer(1)
visualizer.append_clusters(clusters, df.to_numpy(), 0)
visualizer.append_cluster(medoids, data=df.to_numpy(), marker='*', markersize=15)
visualizer.show()

这每次都会给我不同的集群:

enter image description here enter image description here

创建集群工作正常。但是群集每次都不同。我注意到这与以下事实有关:我随机分配了三个初始药物。但是我不禁认为某些事情一定是错误的。如何将最终结果与您分配的随机类固醇联系起来。

我在做错什么吗?

0 个答案:

没有答案